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Abstract 

The goal of this research is to develop agents that are adaptive and predictable and 
timely. At first blush, these three requirements seem contradictory. For example, adap- 
tation risks introducing undesirable side effects, thereby making agents' behavior less pre- 
dictable. Furthermore, although formal verification can assist in ensuring behavioral pre- 
dictability, it is known to be time-consuming. 

Our solution to the challenge of satisfying all three requirements is the following. Agents 
have finite-state automaton plans, which are adapted online via evolutionary learning (per- 
turbation) operators. To ensure that critical behavioral constraints are always satisfied, 
agents' plans are first formally verified. They are then reverified after every adaptation. 
If reverification concludes that constraints are violated, the plans are repaired. The main 
objective of this paper is to improve the efficiency of reverification after learning, so that 
agents have a sufficiently rapid response time. We present two solutions: positive re- 
sults that certain learning operators are a priori guaranteed to preserve useful classes of 
behavioral assurance constraints (which implies that no reverification is needed for these 
operators), and efficient incremental reverification algorithms for those learning operators 
that have negative a priori results. 

1. Introduction 

Agents are becoming increasingly prevalent and eiTective. Robots and softbots, working 
individually or in concert, can relieve people of a great deal of labor-intensive tedium in their 
jobs as well as in their day-to-day lives. Designers can furnish agents with plans to perform 
desired tasks. Nevertheless, a designer cannot possibly foresee all circumstances that will 
be encountered by the agent. Therefore, in addition to supplying an agent with plans, it 
is essential to also enable the agent to learn and modify its plans to adapt to unforeseen 
circumstances. The introduction of learning, however, often makes the agent's behavior 
significantly harder to predict.^ The goal of this research is to verify the behavior of adaptive 
agents. In particular, our objective is to develop efficient methods for determining whether 
the behavior of learning agents remains within the bounds of prespecified constraints (called 
"properties") after learning. This includes verifying that properties are preserved for single 
adaptive agents as well as verifying that global properties are preserved for multiagent 
systems in which one or more agents may adapt. 

An example of a property is Asimov's First Law (Asimov, 1950). This law, which 
has also been studied by Weld and Etzioni (1994), states that an agent may not harm a 



1. Even adding a simple, elegant learning mechanism sucli as cliunking in Soar can substantially reduce 
system predictability (Soar project members, personal communication). 
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human or allow a human to come to harm. The main contribution of Weld and Etzioni is 
a " 'call to arms:' before we release autonomous agents into real- world environments, we 
need some credible and computationally tractable means of making them obey Asimov's 
First Law. ..how do we stop our artifacts from causing us harm in the process of obeying 
our orders?" Of course, this law is too general for direct implementation and needs to be 
operationalized into specific properties testable on a system, such as "Never delete a user's 
file." This paper addresses Weld and Etzioni's call to arms in the context of adaptive agents. 
To respond to the call to arms, we are working toward "Asimovian" adaptive agents, which 
we define to be adaptive agents that can verify, in a reasonably efficient manner, whether 
user-defined properties are preserved after adaptation.^ Such agents will either constrain 
their adaptation methods, or repair themselves in such a way as to preserve these properties. 

The verification method assumed here, model checking, consists of building a finite 
model of a system and checking whether the desired property holds in that model. In the 
context of this paper, model checking determines whether S \= P for plan S and property 
P, i.e., whether plan 5* "models" (satisfies) property P. The output is either "yes" or "no" 
and, if "no," one or more counterexamples are provided. Model checking has proven to be 
very effective for safety-critical applications, e.g., a model checker uncovered a potentially 
disastrous error in a system designed to make buildings more earthquake resistant. This 
error would have unleashed a structural force to worsen earthquake vibrations, rather than 
dampen them (Elseaidy et al., 1994). 

Essentially, model checking is brute force search through the set of all reachable states of 
the plan to check if the property holds. If the plan has a finite number of states, this process 
terminates. Model checking global properties of a multiagent plan has time complexity that 
is exponential in the number of agents."^ With a large number of agents, this is could be 
a serious problem. In fact, even model checking a single agent plan with a huge number 
of states can be computationally prohibitive. A great deal of research in the verification 
community is currently focused on reduction techniques for handling very large state spaces 
(Clarke & Wing, 1997). One of the largest systems model checked to date using these 
reduction techniques had 10^^" states (Burch et al., 1994). Nevertheless, the applicability 
of many of these reduction techniques is restricted and few are completely automated. 
Furthermore, none of them are tailored for efficient reverification after learning has altered 
the system. Some methods in the literature are designed for software that changes. One 
that emphasizes efficiency, as ours does, is Sokolsky and Smolka's (1994). However none 
of them, including Sokolsky and Smolka's method, are applicable to multiagent systems in 
which a single agent could adapt, thereby altering the global behavior of the overall system. 
In contrast, our approach addresses the timeliness of adaptive multiagent systems. 

Consider how reverification fits into our overall adaptive agents framework. In this 
framework (see Figure 1), there are one or more agents with "anytime" plans (Grefenstette 
Sz Ramsey, 1992), i.e., plans that are continually executed in response to internal and 
external environmental conditions. Each agent's plan is assumed to be in the form of a 
finite-state automaton (FSA). FSAs have been shown to be effective representations of 



2. They are also called APT agents because they are adaptive, predictable and timely. 

3. The states in a multiagent plan are formed by taking the Cartesian product of states in the individual 
agent plans (see Section 3). 
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plan(s) if properties not satisfied 



Figure 1: Verifiable adaptive agents. 



reactive agent plans/strategies (Burkhard, 1993; Kabanza, 1995; Carmel & Markovitch, 
1996; Fogel, 1996). 

Let us begin with step 1 in Figure 1. There are at least a couple of ways that the FSA 
plans could be formed initially. For one, a human plan designer could engineer the initial 
plans. This may require considerable effort and knowledge. An appealing alternative is to 
evolve (i.e., learn using evolutionary algorithms) the initial plans in a simulated environ- 
ment. Fogel (1996) outlines a procedure for evolving FSAs that is effective for a number of 
problems, including an iterated version of the Prisoner's Dilemma. 

Human plan engineers or evolutionary algorithms can develop plans that satisfy an 
agent's goals to a high degree. However, to provide strict behavioral guarantees, formal 
verification is also required. Therefore we assume that prior to fielding the agents, the 
(multi)agent plan has been verified offline to determine whether it satisfies critical properties 
(steps 2 and 3). If not, the plan is repaired (step 4). Plan repair is not addressed in this 
paper, although it is an important topic for future research. Steps 2 through 4 require some 
clarification. If there is a single agent, then it has one FSA plan and that is all that is 
verified and repaired, if needed. We call this SITiagent- (This notation, as well as other 
notation used in the paper, is included in the glossary of Appendix A.) If there are multiple 
agents that cooperate, we consider two possibilities. In SITipian, every agent uses the same 
multiagent plan, which is a "product" of the individual agent plans. This multiagent plan 
is formed and verified to see if it satisfies global multiagent coordination properties. The 
multiagent plan is repaired if verification produces any errors, i.e., failure of the plan to 
satisfy a property. In SIT^ultplans^ each agent independently uses its own individual plan. 
To verify global properties, one of the agents takes the product of these individual plans to 
form a multiagent plan. This multiagent plan is what is verified. For SITmultplansi one or 
more individual plans are repaired if the property is not satisfied. 

After the initial plan(s) have been verified and repaired, the agents are fielded. While 
fielded (online), the agents apply learning (e.g., evolutionary operators) to their plan(s) 
as needed (step 5). Learning may be required to adapt the plan to handle unexpected 
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situations or to fine-tune the plan. If SITiagent or SITipiam the single (multi)agent plan 
is adapted. If SITmultplans- ^rn agent adapts its own FSA. after which the multiagent 
(product) plan is re-formed. For all situations, one agent then rapidly reverifies the new 
(multi)agent plan to ensure it still satisfies the required properties (steps 6 and 7). Re- 
formation of the multiagent plan and reverification are required to be as time-efficient 
as possible because they are performed online, perhaps in a highly time-critical situation. 
Whenever (re) verification fails, it produces a counterexample that is used to guide the choice 
of an alternative learning operator or other plan repair as needed (step 8). This process of 
executing, adapting, and reverifying plans cycles indefinitely as needed. The main focus of 
this paper is steps 6 and 7. 

Rapid reverification after learning is a key to achieving timely agent responses. Our long- 
term goal is to examine all learning methods and important property classes to determine 
the quickest reverification method for each combination of learning method and property 
class. In this paper we present new results that certain useful learning operators are a 
priori guaranteed to be "safe" with respect to important classes of properties. In other 
words, if the property holds for the plan prior to learning, then it is guaranteed to still 
hold after learning.'^ If an agent uses these learning operators, it will be guaranteed to 
preserve the properties with no reverification required, i.e., steps 6 through 8 in Figure 1 
need not be executed. This is the best one could hope for in an online situation where rapid 
response time is critical. For other learning operators and property classes our a priori 
results are negative. However, for the cases in which we have negative results, we present 
novel incremental reverification algorithms. These methods localize the reverification in 
order to save time over total reverification from scratch.^ We also present a novel algorithm 
for efficiently re-forming a multiagent plan, for the situation {SITmultplans) in which there 
are multiple agents, each learning independently. 

The novelty of our approach is not in machine learning or verification per se, but rather 
the synthesis of the two. There are numerous important potential applications of our 
approach. For example, if antiviruses evolve more effective behaviors to combat viruses, we 
need to ensure that they do not evolve undesirable virus-like behavior. Another example is 
data mining agents that can flexibly adapt their plans to dynamic computing environments 
but whose behavior is adequately constrained for operation within secure or proprietary 
domains. A third example is planetary rovers that adapt to unforeseen conditions while 
remaining within critical mission parameters. Yet another example is automated factories 
that adapt to equipment failures but continue operation within essential tolerances and 
other specifications. Also, there are ongoing discussions at the Universities Space Research 
Association about launching orbiting unmanned vehicles to run laboratory experiments. 
The experiments would be semiautomated, and would thus require both adaptation and 
behavioral assurances. 

The last important application that we will mention is in the domain of power grid 
and telecommunications networks. The following is an event that occurred ( The New York 
Times, September 21, 1991, Business Section). In 1991 in New York, local electric utilities 
had a demand overload. In attempting to assist in solving the regional shortfall, AT&T 
put its own generators on the local power grid. This was a manual adaptation, but such 

4. This idea of property-preserving learning transformations was first introduced by Gordon (1998). 

5. Incremental methods are often used in computer science for improving the time-efficiency of software. 
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adaptations are expected to become increasingly automated in the future. As a result of 
AT&T's actions, there was a local power overload and AT&T lost its own power, which 
resulted in a breakdown of the AT&T regional communications network. The regional net- 
work breakdown propagated to create a national breakdown in communications systems. 
This breakdown also triggered failures of many other control networks across the country, 
such as the air traffic control network. Air travel nationwide was shut down. In the future, 
it is reasonable to expect that some network controllers will be implemented using multiple, 
distributed cooperating software agents. This example dramatically illustrates the poten- 
tial vulnerability of our national resources unless these agents satisfy all of the following 
criteria: continuous execution/monitoring, flexible adaptation to failures, safety /reliability, 
and timely responses. Our approach ensures that agents satisfy all of these. 

This paper is organized as follows. Section 2 provides an illustrative example that is used 
throughout the paper. Section 3 has the necessary background definitions of FSAs, property 
types, formal verification, and machine learning operators. A priori results for specific 
machine learning operators are in Section 4. These learning operators alter automaton edges 
and the transition conditions associated with edges. A transition condition specifies the 
condition under which a state-to-state transition may be made. We present positive a priori 
results for some of these operators, where a "positive a priori result" means that the learning 
operator preserves a specified class of properties. On the other hand, counterexamples are 
presented to show that some of the learning operators do not necessarily preserve these 
properties. Section 5 extends the a priori results for the multiagent situation S ITmultplans ■ 

For all cases where we obtain negative a priori results. Section 6 provides incremental 
algorithms for re-forming the multiagent plan and reverifying it, along with a worst-case 
complexity analysis and empirical time complexity results. The empirical results show 
as much as a ^-billion-fold speedup for one of the incremental algorithms over standard 
verification. The paper concludes with a discussion of related work and ideas for future 
research. 

2. Illustrative Example 

We begin with a multiagent example for SITipian or SITmuitplans that is used throughout the 
paper to illustrate the definitions and ideas. The section starts by addressing SITmuitplans , 
where multiple agents have their own independent plans. Later in the section we address 
SITipian, where each agent uses a joint multiagent plan. 

Imagine a scenario where a vehicle has landed on a planet for the purpose of exploration 
and sample collection, for example as in the Pathfinder mission to Mars. Like the Pathfinder, 
there is a lander (called agent "L") from which a mobile rover emerges. However, in this 
case there are two rovers: the far ("F") rover for distant exploration, and the intermediary 
("I") rover for transferring data and samples from F to L. 

We assume an agent designer has developed the initial plans for F, I, and L, shown 
in Figures 2 and 3. These are simplified, rather than realistic, plans - for the purpose of 
illustration. Basically, rover F is either collecting samples/data (in state COLLECTING) or 
it is delivering them to rover I (when F is in its state DELIVERING). Rover I can either be 
receiving samples/data from rover F (when I is in its RECEIVING state) or it can deliver 
them to lander L (when it is in its DELIVERING state). If L is in its RECEIVING state. 
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I-receive A 
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Figure 2: Plans for rovers F (left) and I (right). 
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Figure 3: Plan for the lander L. 



then it can receive the samples/data from I. Otherwise, L could be busy transmitting data 
to Earth (in state TRANSMITTING) or pausing between actions (in state PAUSING). 

As mentioned above, plans are represented using FSAs. An FSA has a finite set of states 
(i.e., the vertices) and allowable state-to-state transitions (i.e., the directed edges between 
vertices). The purpose of having states is to divide the agent's overall task into subtasks. 
A state with an incoming arrow not from any other state is an initial state. Plan execution 
begins in an initial state. 

Plan execution occurs as the agent takes actions, such as agent F taking action F-collect 
or F-deliver. Each agent has a repertoire of possible actions, a subset of which may be 
taken from each of its states. A plan designer can specify this subset for each state. The 
choice of a particular action from this subset is modeled in the FSA as nondeterministic. 
It is assumed that further criteria, not specified here, are used to make the final run-time 
choice of a single action from a state. 

Let us specify the set of actions for each of the agents (F, I, L) in our example. F has 
two possible actions: F-collect and F-deliver. The first action means that F collects samples 
and/or data, and the second action means that it delivers these items to I. Rover I also 
has two actions: I-receive and I-deliver. The first action means I receives samples/data 
from F, and the second means that it delivers these items to L. L has three actions: L- 
transmit, L-pause, and L-receive. The first action means L transmits data to Earth, the 
second that it pauses between operations, and the third that it receives samples/data from 
I. For each FSA, the set of allowable actions from each state is specified in Figures 2 and 3 
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in small font next to the state. For example, rover F can only take action F-deliver from 
its DELIVERING state. 

The transition conditions (i.e., the logical expressions labeling the edges) in an FSA plan 
describe the set of actions that enable a state-to-state transition to occur. The operator A 
means "AND," V means "OR," and ^ means "NOT." The condition "else" will be defined 
shortly. The transition conditions of one agent can refer to the actions of one or more other 
agents. This is because each agent is assumed to be reactive to what it has observed other 
agents doing. If not visible, agents communicate their action choice. 

Once an agent's action repertoire and its allowable actions from each state have been 
defined, "else" can be defined. The transition condition "else" labeling an outgoing edge 
from a state is an abbreviation denoting the set of all remaining actions that may be taken 
from the state that are not already covered by other transition conditions. For example, 
in Figure 3, L's three transition conditions from state TRANSMITTING are (I-receive A 
L-transmit), (I-receive A L-pause), and "else." L can only take L-transmit or L-pause from 
this state. However, rover I could take I-deliver instead of I-receive. Therefore, in this case 
"else" is equivalent to ((I-deliver A L-transmit) V (I-deliver A L-pause)). 

An FSA plan represents a set of allowable action sequences. In particular, a plan is the 
set of all action sequences that begin in an initial state and obey the transition conditions. 
An example action sequence allowed by F's plan is ((F-collect A I-deliver), (F-collect A 
I-receive), (F-deliver A I-receive), ...) where F takes its actions and observes l's actions at 
each step in the sequence. 

At run-time, these FSA plans are interpreted in the following manner. At every discrete 
time step, every agent (F, I, L) is at one of the states in its plan, and it selects the next 
action to take. Agents choose their actions independently. They do not need to synchronize 
on action choice. The choice of action might be based, for example, on sensory inputs from 
the environment. Although a complete plan would include the basis for action choice, as 
mentioned above, here we leave it unspecified in the FSA plans. Our rationale for doing 
this is that that the focus of this paper is on the verification of properties about correct 
action sequences. The basis for action choice is irrelevant to these properties. 

Once each agent has chosen an action, all agents are assumed to observe the actions 
of the other agents that are mentioned in its FSA transition conditions. For example, F's 
transition conditions mention l's actions, so F needs to observe what I did. Based on its 
own action and those of the other relevant agent(s), an agent knows the next state to which 
it will transition. There is only one possible next state because the FSAs are assumed to 
be deterministic. For example, if F is in its COLLECTING state, and it chooses action 
F-collect, and it observes I taking action I-deliver, then it will stay in its COLLECTING 
state. The process of being in a state, choosing an action, observing the actions of other 
agents, then moving to a next state, is repeated indefinitely. 

So far, we have been assuming SIT^uitpians where each agent has its own individual 
plan. If we assume SITipian, then each agent uses the same multiagent plan to decide its 
actions. A multiagent plan is formed by taking a "product" (defined in Subsection 3.1) 
of the plans for F, I, and L. This product models the synchronous behavior of the agents, 
where "synchronous" means that at each time step every agent takes an action, observes 
actions of other agents, and then transitions to a next state. The product plan is formed, 
essentially, by taking the Cartesian product of the individual automaton states and the in- 
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tersection of the transition conditions. Multiagent actions enable state-to-state transitions 
in the product plan. For example, if the agents jointly take the actions F-deliver and I- 
receive and L-transmit, then all agents will transition from the joint state (COLLECTING, 
RECEIVING, TRANSMITTING) to the joint state (DELIVERING, DELIVERING, RE- 
CEIVING) represented by triples of states in the FSAs for F, I, and L. A multiagent plan 
consists of the set of all action sequences that begin in a joint initial state of the product 
plan and obey the transition conditions. 

Whether the situation is SITmuitpians or SITipian-, a multiagent plan needs to be formed 
to verify global multiagent coordination properties (see step 2 of Figure 1). Verification of 
global properties consists of asking whether all of the action sequences allowed by the 
product plan satisfy the property. 

One class of (global) properties of particular importance, which is addressed here, is that 
of forbidden multiagent actions that we want our agents to always avoid, called Invariance 
properties. An example is property PI: -i(I-deliver A L-transmit), which states that it 
should always be the case that I does not deliver at the same time that L is transmitting. 
This property prevents problems that may arise from the lander simultaneously receiving 
new data from I while transmitting older data to Earth. The second important class ad- 
dressed here is Response properties. These properties state that if a particular multiagent 
action (the "trigger") has occurred, then eventually another multiagent action (the neces- 
sary "response") will occur. An example is property P2: If F-deliver has occurred, then 
eventually L will execute L-receive. 

If the plans in Figures 2 and 3 are combined into a multiagent plan, will this multiagent 
plan satisfy properties PI and P2? Answering this question is probably difficult or impos- 
sible for most readers if the determination is based on visual inspection of the FSAs. Yet 
there are only a couple of very small, simple FSAs in this example! This illustrates how 
even a few simple agents, when interacting, can exhibit complex global behaviors, thereby 
making global agent behavior difficult to predict. Clearly there is a need for rigorous be- 
havioral guarantees, especially as the number and complexity of agents increases. Model 
checking fully automates this process. According to our model checker, the product plan 
for F, I, and L satisfies properties PI and P2. 

Rigorous guarantees are also needed after learning. Suppose lander L's transmitter 
gets damaged. Then one learning operator that could be applied is to delete L's action 
L-transmit, which thereafter prevents this action from being taken from state TRANS- 
MITTING. After applying a learning operator, reverification may be required. For this 
particular operator (deleting an action), no reverification is needed (see Section 4). 

In a multiagent situation, what gets modified by learning? Who forms and verifies the 
product FSA? And who performs repairs if verification fails, and what is repaired? The 
answers to these questions depend on whether it is SITipian or SITmuitpians- If SITipi^n, 
the agent with the greatest computational power, e.g., lander L in our example, maintains 
the product plan by applying learning to it, verifying it, repairing it as needed, and then 
sending a copy of it to all of the agents to use. If SITmuitpians, an agent applies learning to 
its own individual plan. The individual plans are then sent to the computationally powerful 
agent, who forms the product and verifies that properties are satisfied. If repairs are needed, 
one or more agents repair their own individual plans. 
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It is assumed here that machine learning operators are apphed one-at-a-time per agent 
rather than in batch and, if SITmultplans^ the agents co-evolve plans by taking turns learning 
(Potter, 1997). Beyond these assumptions, this paper does not focus on the learning opera- 
tors per se (other than to define them). It focuses instead on the outcome resulting from the 
application of a learning operator. In particular, we address the reverification issue. The 
next section gives useful background definitions needed for understanding reverification. 

3. Preliminary Definitions 

This section provides definitions of FSAs, properties, verification, and machine learning 
operators. For a clear, unambiguous understanding of the results in this paper, many of 
these definitions are formal. 

3.1 Automata for Agents' Plans 

FSAs have at least four advantages over classical plans (Nilsson, 1980; Dean k, Wellman, 
1991). For one, unlike classical plans, the type of finite-state automaton plans used here 
allows potentially infinite (indeterminate) length action sequences.^ This provides a good 
model of embedded agents that are continually responsive to their environment without 
any artificial termination to their behavior. Execution and learning may be interleaved in 
a natural manner. Another advantage is that FSA plans have states, and the plan designer 
can use these states to represent subtasks of the overall task. This subdivides the plan into 
smaller units, thereby potentially increasing the comprehensibility of the plans. States also 
enable different action choices at different times, even if the sensory inputs are the same. 
A third advantage of FSA plans is that they are particularly well-suited to modeling the 
concurrent behavior of multiple agents. An arbitrary number of single-agent plans can be 
developed independently and then composed into a synchronous multiagent plan (for which 
global properties may be tested) in a straightforward manner. Finally, FSA plans can be 
verified using the very popular and effective automata-theoretic model checking methods, 
e.g., see Kurshan (1994). 

A disadvantage of FSA plans as opposed to classical plans is that there is a great deal 
of research that has been done on automatically forming classical plans, e.g., see Dean and 
Wellman (1991). It is unclear how much of this might be applicable to FSAs. On the 
other hand, evolutionary algorithms can be used to evolve FSA plans (Fogel, 1996). A 
disadvantage of FSA plans as opposed to plans composed of rule sets is that the latter may 
express a plan more succinctly. Nevertheless for plans that require formal verification, FSAs 
are preferable because the complex interactions that can occur between rules make them 
very hard to verify. Formal verification for FSAs is quite sophisticated and widely used in 
safety-critical industrial applications. 

This subsection, which is based on Kurshan (1994), briefly summarizes the basics of the 
FSAs used to model agent plans. Figures 2 and 3 illustrate the definitions. This paper 
focuses on FSAs that model agents with a potentially infinite lifetime, represented as an 
infinite- length "string" (i.e., a sequence of actions). 



6. Results for agents with finite lifetimes may be found in Gordon (1998, 1999). 
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Before beginning our discussion of automata, we briefly digress to define Boolean al- 
gebra. Examples througlioiit this paper have automaton transition conditions expressed 
in Boolean algebra, because Boolean algebra succinctly summarizes these transition condi- 
tions. Boolean algebra is also useful for succinctly expressing the properties. Furthermore, 
it is easier for us to describe two of the incremental reverification algorithms if we use 
Boolean algebra notation. Therefore, we briefly summarize the basics of Boolean algebra. 

A Boolean algebra /C is a set of elements with distinguished elements and 1, closed 
under the Boolean A, V, and operations, and satisfying the standard properties (Sikorski, 
1969). For elements x and y of /C, a; A y is called the meet of x and y, a; V y is called the join 
of .T and y, and -^x is called the complement oi x. For those readers who are unfamiliar with 
Boolean algebras and who want some intuition for these operations, it may help to imagine 
that each element of K. is itself a set, e.g., a set of actions. Meet, join, and complement 
would then be set intersection, union, and complement, respectively. Elements and 1, 
in this case, would be the empty set (0) and the set of all elements in the universe (?7), 
respectively. 

The Boolean algebras are assumed to be finite. There is a partial order among the 
elements, which is defined as a; ^ y if and only if a; A y = a;. It may help to think of ■< as 
analogous to C for sets. The elements and 1 are defined as Va; G /C, ^ a;, and Va; G /C, 
a; ^ 1. The atoms (analogous to single-element sets) of /C, r(/C), are the nonzero elements 
of /C minimal with respect to -<. In the rovers example, agents F, I, and L each have their 
own Boolean algebra with its atoms. The atoms of F's Boolean algebra are its actions 
F-collect and F-deliver; the atoms of I's algebra are I-receive and I-deliver; the atoms of L's 
algebra are L-transmit, L-pause, and L-receive. The element (F-collect V F-deliver) of F's 
Boolean algebra describes the set of actions {F-collect, F-deliver}. 

A Boolean algebra /Cj is a subalgebra of /C if /Cj is a nonempty subset of /C that is closed 
under the operations A, V, and and also has the distinguished elements and 1. JJlCi 
is the product algebra of subalgebras /Cj. An atom of the product algebra is the meet of 
the atoms of the subalgebras. For example, if ai, a„ are atoms of subalgebras /Ci, )Cm 
respectively, then ai A ... A a„ is an atom of Yli^=i^i- 

The Boolean algebra ICp for agent F's actions is the smallest one containing the atoms 
of F's algebra. It contains all Boolean elements formed from F's atoms using the Boolean 
operators A, V, and including and 1. These same definitions hold for I and L's 
algebras /C/ and ICl- ICfICjICl is the product algebra used for all transition conditions 
in the multiagent plan (i.e., the product of the F, I, and L FSAs). One atom of the 
product algebra K.pK.iK,L is (F-collect A I-receive A L-pause). This is the form of actions 
taken simultaneously by the three agents. Algebras K,f, A^/, and Kl are subalgebras of the 
product algebra /Cf/C/ZCl. 

Let us return now to automata. Formally, an FSA of the type considered here is a 
three-tuple S = {V{S), M^iS), I(S)) where V{S) is the set of vertices (states) of S, K. 
is the Boolean algebra corresponding to 5, Mfc{S) : V{S) x V{S) — )■ /C is the matrix of 
transition conditions which are elements of /C, and I{S) C V{S) are the initial states.^ Also, 
E{S) = {e G V{S) X V{S) I Mfc{e) / 0} is the set of directed edges connecting pairs of 
vertices of S. Mjcie), which is an abbreviation for Mx;(S')(e), is the transition condition of 

7. There should also be an output subalgebra, as in Kurshan (1994). This would help distinguish an agent's 
own actions from those of other agents. However it is omitted here for notational simplicity. 



104 



AsiMOviAN Adaptive Agents 



COLLECTING, RECEIVING, TRANSMITTING J 



F-deliver A I-receive A L-transmit 



DELIVERING, DELIVERING, RECEIVING J 

Figure 4: Part of the product plan for agents F, I, and L. 



MxiS) corresponding to edge e. Note that we omit edges labeled "0." By our definition, 
an edge whose transition condition is does not exist. We can alternatively denote Mfc{e) 
as M!c{vi,Vj) for the transition condition corresponding to the edge going from vertex Vi to 
vertex Vj. For example, in Figure 3, Mx; ((TRANSMITTING, PAUSING)) is (I-receive A 
L-pause). 

Figures 2 and 3 illustrate these FSA definitions. There are FSA plans for three agents, 
F, I, and L with vertices, edges, and transition conditions. An incoming arrow to a state, 
not from any other state, signifies that this is an initial state. 

A multiagent plan is formed from single agent plans by taking the tensor product (also 
called the "synchronous product'' or simply "product" ) of the FS As corresponding to the 
individual plans. Formally, the tensor product is defined as: 

^USi = {xV{Si), ^iM{Si), xI{Si)) 

where x is the Cartesian product, and the tensor product M{Si) ® ... ® M{Sn) of n tran- 
sition matrices is defined as M(Si) ® ... (X> M(5,i)((ui, ui'), (u„, u„')) = M(S'i)(fi, fi') 
A ... A M{Sn){vn,Vn') foi {vi,vi') G E{Si), ...,{vn,Vn') G E{Sn). In words, the product 
FSA is formed by taking the Cartesian product of the vertices and the intersection of the 
transition conditions. Initial states of the product FSA are tuples formed from the initial 
states of the individual FSAs. 

The product FSA models a set of synchronous FSAs. The Boolean algebra correspond- 
ing to the product FSA is the product algebra. For Figures 2 and 3, to formulate the FSA 
S modeling the entire multiagent plan, we take the tensor product S = F®I®Lof the 
three FSAs. For this tensor product, /(S) = {(COLLECTING. RECEIVING, TRANS- 
MITTING), (COLLECTING, RECEIVING, PAUSING), (COLLECTING, RECEIVING, 
RECEIVING)}. Part of the tensor product FSA is shown in Figure 4. 

Next we define the language of an FSA, which is the set of all action sequences permitted 
by the FSA plan. To do this, we first define a string, which is a sequence of actions (atoms). 
Formally, a string x is an infinite-dimensional vector, (xq,...) G r(/C)"^, i.e., a string is an 
infinite (lo) length sequence of actions (where K, is the Boolean algebra used by 5"). A 
run V of string x is a sequence (vq,...) of vertices such that Vi, Xi A Mx;(uj,Uj+i) ^ 0, 
i.e., Xi ^ Mx;(ttj,ttj+i) because the Xj are atoms. In other words, a run of a string is the 
sequence of vertices visited in an FSA when the string satisfies the transition conditions 
along the edges. 
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The language of FSA S is defined as: 

^i'S) = {x G r(/C)'^ I X has a run v = (uq, •••) in S with vq G I{S) } 

Such a run is called an accepting run, and S is said to accept string x. Any requirement 
on accepting runs of an FSA are what is called the FSA acceptance criterion. In this case, 
the acceptance criterion consists of one condition: accepting runs must begin in an initial 
state. The verification literature calls these FSAs, which accept infinite-length strings, 
uj-automata (Kurshan, 1994). 

A few more definitions are needed. An FSA is complete if, for each state v G V{S), 
J2weV{s)^K{v, w) = 1- In other words, an FSA is complete if it specifies what state-to-state 
transition the agent should make for all possible actions taken by the other agents. This is 
a very reasonable assumption to make because otherwise the agent would not know what 
to do in some circumstances. An FSA is deterministic at state v if w ^ w' ^ Mfc{v,w) 
AM/civ, w') = 0. In other words, the choice of action uniquely determines which edge will be 
taken from a state. An FSA is deterministic if it is deterministic at each of its states. Unless 
otherwise stated, it is assumed here that all FSAs are complete and deterministic. The 
restriction to deterministic FSAs is not a major problem because for every nondeterministic 
FSA there is a deterministic one accepting the same language (Kurshan, 1994). 

We also need the definition of a cycle in a graph. Model checking typically consists of 
looking for cycles, as described in Section 3.3. A path in FSA 5 is a sequence of vertices 
V = {vo,...,Vn) G V{S)"'~^^, for n > 1 such that (uj,Uj+i) G E{S) for i = 0, ...,n — 1, i.e., 
Mjc{vi,Vi^i) / 0. If Vn = vq, then v is a cycle. Each cycle in an FSA plan allows the 
possibility that the agent can infinitely often, or as long as desired, revisit the vertices of 
the cycle. It also implies that a substring can be repeated indefinitely. 

We next illustrate some of these definitions. An example string in the language of FSA 
S, the multiagent FSA that is the product of F, I, and L, is 

((F-coUect A I- receive A L-transmit), 

(F-deliver A I-receive A L-receive). 

(F-deliver A I-receive A L-transmit), 

(F-deliver A I-deliver A L-receive), ...). 
This is a sequence of atoms of S. A run of this string is 

((COLLECTING, RECEIVING, TRANSMITTING), 

(DELIVERING, RECEIVING, RECEIVING), 

(DELIVERING. RECEIVING, TRANSMITTING), 

(DELIVERING, DELIVERING, RECEIVING), 

(COLLECTING, RECEIVING, RECEIVING), ...). 

All FSAs in Figures 2 and 3 are complete and deterministic. For example, in Figure 2, 
rover I can only take action I-deliver from its DELIVERING state. However every possible 
action choice of L determines a unique next state for I from DELIVERING. For example, 
if L takes L-transmit then I must stay in state DELIVERING, and if L takes L-receive or 
L-pause then I must go to state RECEIVING. 
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3.2 Properties 

Now that we have presented the FSA formalism used for agent plans, we can address the 
question of how to formalize properties. For verification, properties are typically expressed 
either as FSAs (for automata-theoretic verification) or in temporal logic. Here, we assume 
linear temporal logic. In other words, we assume that time proceeds linearly and we do not 
consider simultaneous possible futures. Using the algorithm of Vardi and Wolper (1986), 
one can convert any linear temporal logic formula into an automaton (because automata 
are more expressive than linear temporal logic). Both representations are used here. To 
simplify our proofs in Section 4, properties are expressed in temporal logic. For some of the 
incremental reverification methods in Section 6, we use automata-theoretic methods with 
an FSA representation for the property. 

Let us begin by defining temporal logic properties. Many of the definitions are based on 
Manna and Pnueli (1991). To bridge the gap between automata (for plans) and temporal 
logic (for properties), we need to define a computational state (c-state). A computation is 
an infinite sequence of temporally-ordered atoms, i.e., a string. A c-state is an atom in a 
computation. In other words, it is a (single or multiagent) action that occurs at a single 
time step in a computation. We continue to refer to an automaton state as simply a "state." 

P is a property that is true (false) for an FSA S. S \= P {S ^ P), if and only if P 
is true for every string in the language C(S) (false for some string in C{S)). The notation 
X 1= P (x ^ P) means string x satisfies (does not satisfy) property P, i.e., the property 
holds (does not hold) for x. Before defining what it means for properties to be true (i.e., 
hold) for a string, we first define what it means for a formula that is a Boolean expression 
to be true at a c-state. A c-state formula p is true (false) at c-state a;^, i.e., Xi\= p {xi ^ 
p) if and only \i Xi ^ p [x-i ^ /;), i.e., a^j Ap 7^ (=0) because p is a Boolean expression 
with no variables on the same Boolean algebra used by FSA S", and xi is an atom of that 
algebra. For example, F-collect |= (F-collect V F-deliver) for c-state F-collect and c-state 
formula (F-collect V F-deliver). One can also talk about a c-state formula being true or 
false for an atom, since a c-state is an atom. 

A c-state formula p is true or false in particular c-states of a string. Property P is 
defined in terms of ^, and is true or false of an entire string. In particular, x |= P or x ^ P 
for the string x. 

We focus on two property classes that are among those most frequently encountered in 
the verification literature: Invariance and Response properties. Invariance and Response 
properties are likely to be useful for agents. For the case of a single agent (SITiagent), In- 
variance properties can express the requirement that a particular action never be executed.^ 
Response properties are also useful for a single agent. They can be used to verify that a 
pair of the agent's actions will occur in the correct order (i.e., a "response" always fol- 
lows a "trigger") in the plan. In the context of multiple agents (SITipian or SITmultplans) 
Invariance properties express the need for parallel multiagent coordination. In particular, 
they express that multiple agents should not simultaneously perform some conflicting set of 

8. This could alternatively be implemented as a run-time check, but then there would be no assurance that 
the plan without the action is a good one, for example, in terms of how well the revised plan satisfies 
the agent's goals (perhaps captured in a "fitness function"). Alternatively, the action (atom) could be 
omitted from the set of actions r(/C). But in general one may not wish to rule out actions, in case the 
situation and/or properties might change. 
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actions. Response properties express the need for sequential multiagent coordination. For 
example, they can express the requirement that one agent's action must follow in response 
to a particular "triggering" action of another agent. 

Here, we only present informal definitions of these properties; the formal definitions are 
in Appendix B. An Invariance property P = U^p ("Invariant not ^") is true of a string 
\i p is "never" true, i.e., \f p is not true in any c-state of the string. P = □(p — >■ C'q) 
is a Response property, where O means "eventually." We call p the "trigger" and q the 
"response." A Response formula states that every trigger is eventually (in finite time) 
followed by a response. 

To illustrate these property types, we continue the rovers and lander example. The 
property PI from Section 2, which states that it should always be the case that I does 
not deliver at the same time that L is transmitting, is formally expressed as an Invariance 
property PI defined as: PI = □ (-i(I-deliver A L-transmit)). Property P2 from Section 2, 
which states that if F-deliver has occurred then eventually L will execute L-receive, is an 
example of a Response property. This is expressed in temporal logic as P2 = □ (F-deliver 
— >■ <> L-receive). 

Next consider the FSA representation for properties. As will be explained in Section 3.3 
on verification, what we really need to express for automata-theoretic verification is the 
negation of the property, i.e., -iP. Strings in the language of FSA -iP violate property P. 
In this paper, we assume that -iP is expressed using the popular Biichi cj-automaton (Biichi, 
1962). We decided to use the Biichi FSA because one of the simplest and most elegant model 
checking algorithms in the literature assumes this type of FSA for the property, and we use 
that algorithm (see Subsections 3.3 and 6.1). A Biichi automaton is defined to be a four- 
tuple S = {V{S),M,c{S), I{S), B{Sj), where B{S) C V{S) is a set of "bad" states. To 
define the language of a Biichi automaton, we require the following preliminary definition. 
For a run v of FSA S, /u(v) = {u G V{S) \ v-i = v for infinitely many ViS in run v}. In 
other words, /i(v) equals the set of all vertices of S that occur infinitely often in the run v. 
Then for a Biichi automaton 5, C{S) = {x G r{K)'^ \ x has a run v = {vq, ...) in S with 
vq G I{S) and /u(v) fl B{S) ^ 0}. In other words, the Biichi automaton has an acceptance 
criterion that requires visiting some bad state infinitely often, as well as beginning in an 
initial state. 

An example deterministic Biichi FSA for -iPl, where Invariance property PI = □ -i(I- 
deliver A L-transmit), is in Figure 5 (on the left) with P(-iPl) = {2}. Note that visiting a 
state in P(-iPl) infinitely often implies Biichi acceptance, and because the FSA expresses 
the negation of the property, visiting a "bad" state in P(-iPl) infinitely often is undesirable. 
From Figure 5 we can see that any string that includes (I-deliver A L-transmit) will visit 
state 2 infinitely often, and P(-iPl) = {2}. Thus any string that starts in state 1 and 
includes (I-deliver A L-transmit) is in £(-iPl) and therefore violates property PI. 

Next consider Response properties of the form 0(p ^ ^<l)- For this paper, the only 
type of FSA that we need for verifying Response properties is the very simple deterministic 
Biichi FSA for the negation of a "First-Response" property.^ (Determinism is needed for 
our efficient internal representation. See Subsection 6.1.) A First-Response property checks 

9. A straightforward inductive argument shows that it is not possible to construct a deterministic Biichi 
automaton with a finite number of states for the negation of the full Response property □(p — )• Oq) 
(Mahesh Viswanathan, personal communication). 
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else 



I-deliver A L-transmit 



"U" 



L-receive 



else 



F-deliver 



else 



Figure 5: Invariance property -iPl (left) and the First-Response version of property -iP2 
(right) as Biichi FSAs, where B{S) = {2} for both automata. 



whether the first trigger p in every string is followed by a response q. Figure 5 (on the right) 
shows a Biichi FSA for the First-Response property corresponding to -'P2, where property 
P2 = □ (F-deliver O L-receive). For this FSA, i?(-iP2) = {2}. Any string whose 
accepting run visits state 2 infinitely often will include the first trigger and not the response 
that should follow it. As discussed in Subsection 6.5, verifying First-Response properties 
can in some circumstances (including all of our experiments) be equivalent to verifying the 
full Response property □(p — >■ Oq). Henceforth, when we use the term "Response" this is 
assumed to include both the full Response and the First-Response versions. 



3.3 Model Checking for Verification 

Now that we have our representations for plans and properties, it is possible to describe 
model checking, i.e., for plan S and property P determining whether S \= P. First, however, 
we need to begin with two essential definitions of accessibility: accessibility of one vertex 
from another, and accessibility of an atom from a vertex. 

Definition 1 Vertex Vn is accessible from vertex vq if and only if there exists a path from 

Vo to Vn- 

Definition 2 Atom a„-i G r(/C) is accessible from vertex vq if and only if there exists a 
path from vq to Vn and fln-i ^ ^Kivn-i^Vn)- 

Accessibility from initial states is central to model checking. The reason is the following. 
Recall from Section 3.2 that property P is true (false) for an FSA S", (i.e., S \= P {S ^ P)), 
if and only if P is true for every string in the language C{S) (false for some string in 
C{S)). By definition, every string in the language has an accepting run. Therefore, it is 
only necessary to verify the property for strings that have an accepting run. By definition, 
every accepting run begins with an initial state. Therefore, every state in an accepting run 
is accessible from an initial state, and every atom (c-state) in a string of the language is 
accessible from an initial state. Clearly, the only states and atoms that need to be involved 
in verification are those accessible from initial states. 

Invariance properties can be re-expressed in terms of accessibility. Invariance property 
□ -ip could be restated as saying that there does not exist any atom a, where a ^ p, that 
is accessible from an initial state. It is much more difficult to express Response properties 
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succinctly using accessibility. Nevertheless, accessibility plays a key role in verifying all 
properties, as will be seen shortly. 

There are a number of ways to perform model checking, but here we focus on two. 
The first method is specifically tailored for one class of properties; the second is sufficiently 
general for use in verifying many classes of properties. The rationale for choosing a specific 
and a general algorithm is that this allows for a comparison to determine the computational 
efficiency gained by property-specific tailoring (see Subsection 6.5). In this section, we give 
high-level sketches of these two model checking algorithms. The full algorithms are in 
Section 6. 

The first algorithm is a very simple and efficient method tailored for Invariance properties 
P = For every initial state Ui, this method begins at Vi and visits every atom aj 

accessible from Uj. If this atom has not already been checked, it checks to see whether 
aj < p. If aj ■< p, then this is considered a verification failure. If there are no failures, 
verification succeeds. 

The second method, automata-theoretic (AT) model checking, is very popular in the 
verification literature (e.g, see Vardi and Wolper, 1986) and can be used to verify any prop- 
erty expressible as a finite-state automaton. It is used here for First-Response properties. 
In AT model checking, asking whether S" |= P is equivalent to asking whether C(S) C C{P) 
for property P. This is equivalent to C{S) fl C{P) = (where C{P) denotes the comple- 
ment of £(P)), which is algorithmically tested by first taking the tensor product of the 
plan FSA S and the FSA corresponding to -iP (i.e., S ® ~'P)- The FSA corresponding to 
-iP accepts C{P). The tensor product implements language intersection. The algorithm 
then determines whether C{S ® ^P) / 0, which implies C(S) f) C{P) / (S" ^ P). This 
determination is implemented as a check for cycles in the product FSA S ® -^P that are ac- 
cessible from some initial state and that satisfy any other conditions in the FSA acceptance 
criterion. Recall that a cycle is a sequence of vertices {vQ,...,Vn) such that u„ = uq- A cycle 
is accessible from an initial state if one of its vertices is accessible from the initial state. A 
cycle that is accessible from an initial state and that satisfies the FSA acceptance criterion 
implies a nonempty language. This is because a string is in the language of an FSA if it is 
an infinite-length sequence of actions satisfying the FSA acceptance criterion, which always 
includes the requirement that its accepting run must begin in an initial state. All infinite 
behavior eventually ends up in a cycle because the FSA has a finite number of states. 

Therefore, to be certain that the language is nonempty, it is necessary to determine 
whether any accessible cycle satisfies the FSA acceptance criterion. The criterion of inter- 
est is the Biichi criterion, for the following reason. It is assumed here that the negation 
of the property {-^P) is expressed as a Biichi automaton. This implies that the FSA be- 
ing searched, i.e., S ® -iP, is also a Biichi automaton, because taking the tensor product 
preserves this criterion. The final check of this algorithm is whether an accessible cycle in 
S ® -iP satisfies the Biichi acceptance criterion, because in that case the language is not 
empty. A product state .s is in B{S ® -iP) whenever it has a component state in B{-^P), 
e.g., (COLLECTING, RECEIVING, RECEIVING, 2) is in B{S ® ^P2) for property P2 
because its fourth component is state 2 of P(-iP2). According to the Biichi acceptance 
criterion, visiting a state v G B{S ® -iP) infinitely often (assuming v is accessible from 
an initial state) implies C{S ® -iP) / 0. This will happen if v is part of an accessible 
cycle. In that case, 5' ^ P and verification fails. Otherwise, if no accessible product state 



110 



AsiMOviAN Adaptive Agents 



V G B{S ® -iP) is visited infinitely often (i.e., it is not in a cycle), then C{S ® -iP) = 
and therefore C{S) C C{P), i.e., S \= P and verification succeeds. A relatively efficient 
algorithm for AT verification from the literature is presented in Section 6. 

3.4 Machine Learning to Adapt Plans 

Given plan S and property P, model checking determines whether S |= P. Next we consider 
the case of learning, which is a change to S. This subsection addresses the issue of how a 
learning operator can affect a plan S to generate a new plan S'. 

We begin by presenting a taxonomy of FSA learning operators. It is likely that any 
learning method for complete deterministic FSAs will be composed of one or more of these 
operators. Nothing about our approach requires evolutionary learning per se; however to 
make the discussion concrete, this is the form of learning that is assumed here. In the 
context of evolutionary algorithms, the FSA learning operators are perturbations, such as 
mutations, applied to the FSAs. 

Procedure EA 

t = 0; I* initial generation */ 

initialize_population(i); 

evaluateJitness(^); 

until termination-criterion do 

t = t+ \] /* next generation */ 

select_parents(i); 

perturb(i); 

evaluate_fitness(i); 
enduntil 
end procedure 

Figure 6: The outline of an evolutionary algorithm. 

We assume that learning occurs in two phases: the offline and online phases (see Fig- 
ure 1). During the offline phase, each agent starts with a randomly initialized population of 
candidate FSA plans. This population is evolved using the evolutionary algorithm outlined 
in Figure 6. The main loop of this algorithm consists of selecting parent plans from the 
population, applying perturbation operators to the parents to produce offspring, evaluat- 
ing the fitness of the offspring, and then returning the offspring to the population if they 
are sufficiently fit. After this evolution, verification and repair are done to these initially 
generated plans. 

At the start of the online phase, each agent selects one "best" (according to its "fitness 
function") plan from its population for execution. The agents are then fielded and plan 
execution is interleaved with learning (adaptation), reverification, and plan repair as needed. 
The purpose of learning during the online phase is to fine-tune the plan and adapt it to 
keep pace with a gradually shifting environment, since normally real-world environments 
are not static. The evolutionary algorithm of Figure 6 is also used during this phase, but the 
assumption is a population size of one and incremental learning (i.e., one learning operator 
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applied per FSA per generation). This is practical for situations in which the environment 
changes gradually, rather than radically. 

Formally, a machine learning operator o : S" — )■ S" changes a (product or individual) 
FSA S to post-learning FSA S'. A mapping between two automata S and S' is defined as a 
mapping between their elements (Bavel, 1983). At the highest level, we can subdivide the 
learning operators according to the elements of the FSA that they alter: 

• One class of operators adds, deletes, or moves edge transition conditions. In other 
words, o : Mk{S) Mk{S'). 

• Another class of operators adds, deletes, or moves edges, i.e., o : E{S) — >■ E{S'). 

• The third class of operators adds or deletes vertices, along with their edges, i.e., 
o : V{S) V{S') and o : E{S) E{S'). 

• The fourth class of operators changes the Boolean algebra used in the transition 
conditions, i.e., o : /C ^ /C'. 

Here, we do not define operators that add or delete states. In other words, we do not 
address the third class of operators. The reason is that with the type of FSAs used here, 
adding or deleting a state does not, in itself, affect properties. It is what we do with the 
edges to/from a state and their transition conditions that can alter whether a property 
is true or false for a plan. This is because properties are true or false for comp-states 
(atoms) rather than for FSA states. Furthermore, this paper does not address changes to 
the Boolean algebra, which is the fourth class of operators. This class of operators, which 
includes abstractions, is addressed in Gordon (1998). 

Therefore we are focusing on the first and second classes of operators. We define operator 
schemas, rather than operators. A machine learning operator schema applies to unspecified 
(variable) vertices, edges, and transition conditions. When instantiated with particular 
vertices, edges, and transition conditions, it becomes a machine learning operator. In order 
to avoid tedium, the operator schema definitions consider only the relevant parts of the 
FSA, e.g., those parts that get altered. There is an implicit assumption that all unspecified 
parts of the FSA remain the same after operator application. There is also an assumption 
that the learner ensures that all operators keep the automaton complete and deterministic. 

The operators can be seen in the taxonomy (partition) of Figure 7. We define each of 
the corresponding operator schemas as follows, beginning with the most general one, called 
Ochangei which changes edge transition conditions: 

Operator Schema 1 (Ochange) Let S he an FSA with Boolean algebra IC, and let Ochonge '■ 

S — >■ S' . Then we define ochange '■ ^K.{S) — )■ Mic(S'). In particular, suppose z ^ M}c.{''^i.V2) ■ 
z^O, for {vi,V2) G E{S) and z 2< M/c(ui,U3) for (ui,U3) G E{S). Then OchangeiMicivi,V2)) 
= Mk:{vi,V2) a ^z (step 1) and/or Ochange{Mic{vi,vz)) = Mic{vi,vz) V z (step 2). In other 
words, Ochange i^o-V consist of two steps: the first to remove condition z from edge (ui,U2); 
and the second to add (the same) condition z to edge {vi,vz)- Alternatively, Ochange fncby 
consist of only one of these two steps. 
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step (1) and step (2) 



step (1) or step (2) alone 



step (l)^^^x^^^P 




Figure 7: Taxonomy (partition) of learning operators. 



All of the remaining operators are easier to describe in terms of a set of four primitive 
operators. Therefore, we next define these four primitives, which are one-step operators 
that are special cases of Ochange ^ind appear at the bottom right as leaves in the hierarchy of 
Figure 7. The first two primitive operators delete (odeiete) and add (oadd) edges. We define 
Odeiete to delete edge (ui,U2) with the operator schema: 

Operator Schema 2 (odeiete) S be an FSA with Boolean algebra /C. and let Odeiete '■ 
S ^ S' be defined with Odeiete '■ ^{3) — >■ E{S) \ {{vi,V2)} for deleted edge (ui,U2) of S. 
Recall that a nonexistent edge has transition condition 0. Operator Odelete could therefore be 
considered a special case of Ochange that consists only of step (1) and an additional condition 
that must be met, namely, that Odeiete{Mic{vi_,V2)) = (Mic{vi,V2) A ^z) = 0. 

We define o^dd to add edge (^1,^3) with the operator schema: 

Operator Schema 3 (Oadd) Let S be an FSA with Boolean algebra JC. and let Oadd '■ S — >■ 
S' be defined with o^dd '■ E{S) — )■ E(S)U{{vi,vz)} for added edge (vi, vz) of S. Operator Oadd 
could be considered a special case of Ochange that consists only of step (2) and the additional 
condition that Mx;(ui,U3) = prior to applying Oadd- 

The other two primitive operators are specialization (ospec) and generalization (ogen)- 
Specialization and generalization are operators commonly found in the machine learning lit- 
erature, e.g., see Michalski (1983). In the context of an FSA, specialization lowers the level 
of a particular state-to-state transition condition in the partial order ^, whereas general- 
ization raises it, as in Mitchell's Version Spaces (Mitchell, 1978). In particular, a transition 
condition can be specialized with a meet and can be generalized with a join, which is 
analogous to adding a conjunct to specialize and a disjunct to generalize as in Michalski 
(1983). 

Formally, we define specialization and generalization, respectively, as follows: 
Operator Schema 4 ( 

Ospec) Fet S be an FSA with Boolean algebra /C. and let Ogpec '■ S )■ 
S". Then we can define Ogpec '■ Mic{S) — )■ Mic{S'), where OspeciMic{vi,V2)) = Mic{vi,V2) A 



113 



Gordon 



^z, for some z E IC. z ^ 0. Operator Ospec could he considered a special case of Ochange 

that 

consists only of step (1) and the additional two conditions Ospec(Mic(vi, V2)) = (M)c.ivi,V2) 
A -^z) / (i.e., Ospec / Odeiete), cbnd M)c{vi,V2) / -^z (since otherwise Ogpec ^o-^ no effect). 

Operator Schema 5 (ogen) Let S he an FSA with Boolean algehra fC, and let Ogen '■ S 
S' . Then we can define Ogen '■ MxiS) — > MxiS'). where Ogen(Mic{vi,vs)) = Mx:{vi,V3) \l z, 
for some z E IC, z ^ 0. Operator Og^n could be considered a special case of ochange that 
consists only of step (2) and the two additional conditions that Mjc{vi,vz) / (i.e., Ogen 
7^ Oadd) ctnd (Mic{vi,V3) Az) = (hecause otherwise z adds redundancy) prior to Og^n- 

Next, 10 learning operators are defined from these four primitives. Below Ochange in 
the operator hierarchy of Figure 7 are two subtrees. The right subtree consists of one-step 
operators, and the left subtree consists of two-step operators. We define the two one-step 
operators just below Ochange first (since we just defined the primitive operators below them): 

Operator Schema 6 (odeietevspec) This operator consists of applying either of the prim- 
itive operators Odelete or Ogpec- 

Operator Schema 7 (Oaddvgen) This operator consists of applying either of the primitive 
operators Oadd or Ogen- 

It is relevant at this point to introduce two more operators that are not in the hierarchy 
of Figure 7. They are not in the hierarchy because they are merely minor variants of 
Odeietevspec and Oaddvgen Siud they do not belong strictly below our most general operator 
Ochange- These operators are introduced here because they are very useful and also because 
they are guaranteed to preserve completeness of FSAs. In other words, if the FSA is 
complete prior to applying these operators then it will be complete after applying them. 
Recall from Section 2 that each FSA state is associated with a set of allowable actions that 
may be taken from that state. These operators delete or add an action from the set of 
allowable actions from a state: 

Operator Schema 8 (odeiete-action) Delete an allowahle action from a state vi hy one 
or more applications of operator Odeietevspec- Each application may he to a different outgoing 
edge from vi . 

Operator Schema 9 (Oadd-action) ^dd cm allowahle action from a state vi hy one or 
more applications of operator Oaddwgen- Each application may he to a different outgoing 
edge from vi . 

To understand why Odeiete-action consists of one or more appHcations of Odeietevspec, 
consider the following example. In Figure 2, deleting F-collect as an allowable action from 
F's COLLECTING state results in F-deliver being the only allowable action from that state. 
Furthermore, this results in the edge (COLLECTING, COLLECTING) being deleted and 
the edge (COLLECTING, DELIVERING) being speciaUzed. The reasoning is similar for 
why Oadd-action is One or more applications of Oaddvgen- 

The remaining operators, which are all of the operators on the left subtree of Ochom.ge in 
Figure 7, consist of two steps: the first to remove condition z from edge {vi^V2), and the 
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STATES 
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Figure 8: Moving transition conditions between edges. 



second to add (the same) condition z to edge {v\^vz). The first step consists of applying 
one primitive operator, and the second step consists of applying another primitive operator. 
Every one of the following operators preserves determinism and completeness of the FSAs. 
In other words, if the FSA is deterministic and complete prior to operator application then 
it will be deterministic and complete afterwards. 

Operator Schema 10 (omove) This operator schema is identical to that of Ochange; with 
one exception. Replace ^'and/or" with ''and" in the definition. In other words, we have 
Omove(Mic(vi,V2)) = M/c (ui,W2) A and Omove{Mic{vi,V3)) = Mx;(ui,U3) V z for some 
{vi,V2), (vi,vz) G EiS). Therefore Omove moves z from one edge to another. 

All of the remaining operators are special cases oiomove- We begin with the right subtree 

of Omove ' 

Operator Schema 11 (odeiete+add) ^pply Odeiete to edge {vi,V2) and then apply Oadd to 
edge [vi.v?,). 

An example of ojeiete+add- using Figure 8. is to delete edge (STATEl, STATES) (i.e., 
make M^; (STATEl, STATES) = 0) and add a new edge (STATEl, STATEl) with transition 
condition M,c (STATEl, STATEl) = c. 

Operator Schema 12 (Ogpec+add) ^PP^V Ogpec to edge (ui,U2) o-nd then apply o add to edge 

(U1,U3)- 

For example, using Figure 8, we can move "b" from edge (STATEl, STATE2) to a 
newly created edge (STATEl, STATEl) to make Mk. (STATEl, STATE2) = a A ^b and 
(STATEl, STATEl) = b. This is specialization of the condition on edge (STATEl, 
STATE2) followed by addition of edge (STATEl, STATEl). 

Next consider the left subtree of Omove- At this point, it is relevant to examine the 
reason for the split into the two subtrees of o-move- All of the operators in the left subtree 
satisfy a condition that is called the "accessibility condition." This condition states that 
prior to learning (and also after learning) , if vertex vi is accessible from some initial state 
then vertex vs is guaranteed to also be accessible from that initial state. The reason for 
this partition will become clear in Subsection 4.2, where we show that a theorem holds for 
the two-step operators if and only if the accessibility condition is true. The reason that the 
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two operators in the right subtree of Omove fail to satisfy the accessibihty condition is due 
to their having Oadd as their second step. The definition of Oadd states that M]c{v\^ vz) = 
prior to operator appUcation, and therefore we have no guarantee of us's accessibihty, given 
that vi is accessible from an initial state. The following are the definitions of the operators 
for which the accessibility condition is true: 

Operator Schema 13 (odeiete+gen) Apply Odeiete to edge (ui,U2) and then apply Ogen to 
edge {vi.vz). 

As an example, in Figure 8, we can move the condition "a V b" from edge (STATEl, 
STATE2) to edge (STATEl, STATES) to make Mk. (STATEl, STATE2) = and to make 
(STATEl, STATES) = c V a V b. This is deletion of edge (STATEl, STATE2) followed 
by generalization of the transition condition on edge (STATEl, STATES). 

Operator Schema 14 (ogpec+gen) Apply Ogp^c to edge (1^1,^2) o.nd then apply Ogen to edge 

As an example, in Figure 8, we can move the disjunct "b" from edge (STATEl, STATE2) 
to edge (STATEl, STATES) to make Mjc (STATEl, STATE2) = a A ^b and (STATEl, 
STATES) = c V b. This is a specialization of the transition condition on edge (STATEl, 
STATE2) followed by a generaUzation of the transition condition on edge (STATEl, STATES) 

Operator Schema 15 (ogtay) The definition is the same as that of Omove, with one ex- 
ception. Replace vertex vz with vertex vi everywhere. In other words, the operator consists 
of moving a condition from edge (ui,U2) to edge (ui,ui). 

Note that each operator instantiation of the schema for Ostay will be a special case of one 
of the following: Oddete+add^ Ospec+add^ Odeiete+gen, oi Ospec+gen- It is Considered Ogtay if and 
only if on the second step of the operator the transition condition is moved to edge [vi.vi). 
For example, using Figure 8, when we applied operator Ospec+add (in the example above) 
to move the disjunct "b" from edge (STATEl, STATE2) to edge (STATEl, STATEl) to 
make M;c (STATEl, STATE2) = a A ^b and M;c (STATEl. STATEl) = b, this could 
be considered an instantiation of Ostay, as well as Ogpec+add- Likewise when we applied 
Odeiete+add to delete edge (STATEl, STATES) and add edge (STATEl, STATEl) with "c" 
as the transition condition, this could also be considered an instantiation of Ostay 

Operator Ogtay is an especially useful operator. It makes the reasonable assumption that 
when an agent no longer wants to transition to another state (e.g., an edge is deleted), 
the agent just stays in its current state. In other words, the condition for transitioning to 
another state is transferred to the edge leading back to the current state. For example, 
suppose rover I becomes stuck at the lander and cannot rendezvous with F for an inde- 
terminate period of time. It could generate a temporary plan (see Figure 2) that keeps 
I in its DELIVERING state by deleting edge (DELIVERING, RECEIVING) and making 
Mx: (DELIVERING, DELIVERING) = 1 (and DELIVERING would have to become an 
initial state). 

Recall that accessibility is a key issue for verification. Now that we have a set of op- 
erator schemas, let us consider how these operators affect accessibility from initial states. 
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Clarifying this will be relevant for understanding both the a priori proofs about property 
preservation, and the motivation for the incremental reverification algorithms. There are 
two fundamental ways that our learning operators may affect accessibility: locally (ab- 
breviated "L"), i.e., by directly altering the accessibility of atoms or states, or globally 
(abbreviated "G"), i.e., by altering accessibility of states or atoms that could be visited 
after the part of the FSA modified by the learning operator. In particular, any change to 
the accessibility of ui, V2, V3 or atoms in M!c{vi,V2) or Mx:(fi,f3), referenced in the oper- 
ator definition, is considered local. Changes to accessibility of any other states or atoms is 
considered global. 

As an example of an L (local) change to accessibility, using Figure 8, suppose the agent 
discovers a new action "d" that it can take. It adds "d" to its action repertoire, as well as 
to the set of allowable actions from one of the states in its FSA. In particular, the agent 
decides to allow "d" from STATEl and decides to apply Oqqji to th.G trSiiisitioii condition for 
(STATEl, STATES) to get condition "c V d." Then atom "d" was not previously accessible 
from any initial state, but if we assume STATEl is accessible from an initial state then the 
application of Oggjj made the atom "d" accessible. Using Figure 8 to illustrate a G (global) 
change to accessibility, suppose we delete edge (STATEl, STATES) in that figure. Then 
STATE4, which was previously accessible (because we assume STATEl is accessible) is no 
longer accessible. On the other hand, the fact that STATES is no longer accessible is a local 
change. 

Now we are ready to summarize what the learning operators can do to accessibility. 
First, we introduce one more notational convenience. The symbols t and I denote "can 
increase" and "can decrease," respectively, and f and ^ denote "cannot increase" and "can- 
not decrease," respectively. We use these symbols with G and L, e.g., f G means that a 
learning operator can (but does not necessarily) increase global accessibility, and J/L means 
that an operator cannot decrease local accessibility. 

The results for the primitive operators are intuitively obvious: 

. Odelete-. iGiLyCyL 

. ospec- yCiLfGrL 
. oadd-. ycyLtGtL 

. Ogen- yG^LyGtL 

The primitive operators provide answers about changes in accessibility for all of the 
one-step operators. For the two-step operators (i.e., Omove f^nd all operators below it in the 
hierarchy of Figure 7), we need to consider the net effect. For the results in this paper, we 
only need to focus on one distinction - the difference in the net effect for those operators 
that satisfy the accessibility condition (i.e., the left subtree oiomove) versus the net effect for 
those operators that do not satisfy this condition (i.e., the right subtree). The net effect of 
those operators that satisfy the accessibility condition is that accessibility (global and local) 
will never be increased, i.e., y G and fL. The reason is as follows. By looking at the results 
for the primitive operators, it is apparent that the first step in these two-step operators can 
never increase accessibility, because the first step is always Odeiete or Ogpec- Therefore, to 
understand the intuition behind this result we need to examine the second step. Consider 
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Odelete+gen ^'^d Ogpec+gen- Note that Og^n does not increase global accessibility (yC), but 
it can increase local accessibility L). Is ^ L a net effect due to the generalization step? 
Because atoms are being transferred from one outgoing edge of some vertex vi to another 
outgoing edge of vi with these two operators, by definition the local accessibility of those 
atoms from an initial state will not be increased as a net effect. In other words, the atoms 
are accessible from an initial state if and only if vi is, and these two learning operators 
do not increase the accessibility of vi. Furthermore, by definition Mx,{vi,vz) / prior to 
learning, so the accessibility of ^3 is not increased. We conclude that y L is a net effect. 

A similar line of reasoning explains why operator Ogtay will not increase local accessibility. 
Operator Ogtay cannot increase global accessibility, even if it adds an edge, because the only 
edge that this operator could add is In conclusion, all three operators that satisfy 

the accessibility condition have a net effect of not increasing accessibility (y G and y L). On 
the other hand, because operators Odeiete+add Ogpec+add have Oadd their second step, 
they can increase accessibility. 

Results from lower in the hierarchy of Figure 7 are inherited up the tree. For example, 
because Odeiete+add Can increase global accessibility, Omove can as well. The following is a 
summary of the relevant results we have so far about how the two-step learning operators 
can change accessibility. To avoid overwhelming the reader, we present only those results 
necessary for understanding this paper. 

• Ogfayj 0(^e/ete+gen5 Ogpgc^ggn- yC yL 

• (^delete+add: (^spec+add-i Omove: (^change'- T G 

Before concluding this section, we briefly consider a different partition of the learning 
operators than that reflected in the taxonomy of Figure 7. This different partition is neces- 
sary for understanding the a priori proofs about the preservation of Response properties (in 
Section 4). For this partition, we wish to distinguish those operators that can introduce at 
least one new string with an infinitely repeating substring (e.g., (a,b,c,d,e,d,e,d,e,...) where 
the ellipsis represents infinite repetition of d followed by e) into the FSA language versus 
those that cannot. Any operator that can add atoms to the transition condition for an edge 
in a cycle, add an edge to an existing cycle, or add an edge to create a new cycle belongs 
to the first class (the class that can add such substrings). Thus this first class includes our 
operators that can create new cycles (e.g., Ogtay because it can add a new edge (fi,ui)), as 
well as our operators that can generalize the transition condition along some edge of a cycle 
(e.g., Oddete+gen because it can generalize Mjc{vi,vi)). The operators are divided between 
these two classes as follows: 

1- (^addi Ogeni OaddWgem ^^add— action: Og^ay, 0(^e/ete+gen: Ogpg^^gg^: ^(^e/ete+add: (^spec+addi Omovei 
(^change 

2. Ofigigfg, Ogpeci OdeieteVspec: Odgigig—agHQn 

It is important to note that all of the two-step operators are in the first class. 

At this point we have defined a set of useful operators (via their operator schemas) that 
one could apply to an FSA plan for adaptation. With these operators, it is possible to 
improve the effectiveness of a plan, and to adapt it to handle previously unforeseen external 
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and internal conditions. To ensure the usefulness of these learning operators, the learner 
needs to check that it has not generated a useless plan (i.e., C{S) ^ 0). Although not 
addressed in this paper, we are currently developing efficient methods for making this check 
using the knowledge of the learning that was done. 

The particular choice of learning operators presented here was motivated by four factors. 
First, these operators translate into easy-to-implement perturbations of entries in a table, 
which is the representation of FSAs used in our implementation (see Section 6). Second, 
these operators were inspired by the literature. For example, generalization and special- 
ization operators are considered fundamental for inductive inference (Michalski, 1983), and 
deleting/adding FSA edges are effective for evolving FSAs (Fogel, 1996). Third, these oper- 
ators made practical sense in the context of applications that were considered. Fourth, the 
particular taxonomies presented here facilitate powerful theoretical and empirical results 
for reducing the time complexity of reverification, as shown in the remainder of this paper. 

4. A Priori Results about the Safety of Machine Learning Operators 

Subsection 3.4 defined several useful learning operator schemas to modify automaton edges 
(o : E(S) E(S')) and the transition conditions along edges (o : MjciS) -> MjciS')). The 
results in this section establish which of these operator schemas o are a priori guaranteed to 
preserve two property classes of interest (Invariance and Response). This section assumes 
that all learning operators are applied to a single FSA plan, i.e., SITiagent or SITipian- 
Section 5 addresses the translation of the operators applied to a single plan into their effect 
on a product plan (for S IT„iuitpians) - '^^^^ liow this affects the results. We begin by formally 
defining what we mean by "safe machine learning operator." 

4.1 "Safe" Online Machine Learning 

Our objective is to lower the time complexity of reverification. The ideal solution is to iden- 
tify safe machine learning methods (SMLs), which are machine learning operators that are 
a priori guaranteed to preserve properties (also called "correctness preserving mappings") 
and require no run-time reverification. For a plan S and property P, suppose verification 
has succeeded prior to learning, i.e., Vx, x G jC(S') implies x |= P (i.e., S \= P). Then a 
machine learning operator o{S) is an SML if and only if verification is guaranteed to succeed 
after learning. In other words, if S" = o{S), then S \= P implies S' \= P. 

Subsection 4.2 provides results about the a priori safety of machine learning operators. 
Some of the results in Subsection 4.2 are negative. Nevertheless, although we do not have 
an a priori guarantee for these learning operators. Section 6 shows that we can perform 
reverification more efficiently than total reverification from scratch. 

4.2 Theoretical Results 

Let us begin by considering the primitive operators. The results for all primitive operators 
are corollaries of two fundamental theorems. Theorems 1 and 2, which may not be imme- 
diately intuitive. For example, it seems reasonable to suspect that if an edge is deleted 
somewhere along the path from a trigger to a response, then this could cause failure of a 
Response property to hold because the response is no longer accessible. In fact, this is not 
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true. What actually happens is that deletions reduce the number of strings in the language. 
If the original language satisfies the property then so does is the smaller language. Theorem 
1 formalizes this. 

Theorem 1 Let S' be an FSA with Boolean algebra K.. Let S he identical to S' , hut with 
additional edges, i.e., o : S' ^ S is defined as o : E{S') E{S), where E{S') C E{S). 
Then C{S') C C{S). 

Proof. The language may be enlarged by the addition of new edges that have newly learned 
transition conditions. On the other hand, because every accepting run remains an accepting 
run regardless of new edges, x G JC{S') implies x G C{S), and we are never reducing the 
size of the language. Therefore, £(S") C C{S). □ 

The results about the machine learning operator schemas Odeiete and Oadd follow as corol- 
laries: 

Corollary 1 odeiete is an SML with respect to any property P. 

Proof. Assume S \= P. Then Vx, x G C{S) implies x |= P. Define OdeietriS) = S'. By 
Theorem 1, £(S") C C{S). Therefore, Vx, x G C{S') implies x ^ P. We conclude that 

S' \=P, i.e., OdeleteiS) ^ P. □ 

To be consistent with Theorem 1, in Corollary 2 only (but not in the rest of the paper), we 
use S' for the pre-Oa^d FSA and S for the post-Oadd FSA, i.e., Oadd{S') = S. 

Corollary 2 Oadd not necessarily an SML for any property, including Invariance and 
Response properties. 

Proof. Assume S' |= P. Then Vx, x G C{S') implies x ^ P. By Theorem 1, C{S') C 
C{S). Then we cannot be certain that S |= P, i.e., that Oadd{S') |= P- For instance, 
a counterexample for Invariance property D-ip occurs if we add an accessible edge with 
transition condition p. □ 

Now we consider a priori results for Ogpec 

and Ogen- Again, we begin with a relevant 

theorem for operator schema o. 

Theorem 2 Let S' be an FSA with Boolean algebra IC, and let o : S' ^ S be defined as 
o : Mk.{S') Mk.{S) where 3z e IC, z 0, (ui,U3) G E{S'), such that o(Mx;(ui, us)) = 
MK{vi,vs)yz. Then £{8') C jr{S). 

Proof. Similar to the proof of Theorem 1. □ 

Corollary 3 Ospec is an SML for any property. 

Proof. Similar to the proof of Corollary 1 of Theorem 1. □ 

Corollary 4 Ogpn is not necessarily an SML for any property, including Invariance and 
Response properties. 



120 



AsiMOviAN Adaptive Agents 



Proof. Similar to the proof of Corollary 2 of Theorem 1. □ 

We can draw the following conclusions from the theorems and corollaries just presented: 

• Of the one-step learning operators, those that are guaranteed to be SMLs for any 
property are Odeiete, Ogpec, and Odeietevspec (which implies that Odeiete-action is also an 
SML for any property). 

• We need never be concerned with the first step in a two-step operator. It is guaranteed 
to be an SML (because o^e/eie or Ospec is always the first step). 

Next consider theorems that are needed to address the two-step operators. Although we 
found results for the one-step operators that were general enough to address any property, 
we were unable to do likewise for the two-step operators. Our results for the two-step op- 
erators determine whether these operators are necessarily SMLs for Invariance or Response 
properties in particular. Future work will consider other property classes. The theorems 
are quite intuitive. The first theorem distinguishes those learning operators that will satisfy 
Invariance properties from those that will not: 

Theorem 3 A machine learning operator is guaranteed to be an SML with respect to any 
Invariance property P if and only if Y G and f L are both true (which, for our two-step 
operators, implies that the operator satisfies the accessibility condition). 

Proof. Suppose f G and f L are both true. Let Invariance property P = □ -ip. Assume P 
is true of FSA S prior to learning. Then for every string y G jC(S'), it must be the case that 
-tp is true in every c-state of y. If accessibility of atoms is not increased (i.e., f G and y 
L). then it must be the case that every c-state of every string x G C{S'), where S" = o{S), 
is also a c-state of some string in C{S). Therefore, for every string x G >C(S"), it must be 
the case that -^p is true in every c-state of x. In other words, moving transition conditions 
around in an FSA without increasing accessibility will not alter the truth of an Invariance 
property, which holds in every c-state of every string in the language of the FSA. 

Suppose t G or t L. Increasing accessibility of atoms implies the possibility of introducing 
a c-state in some string x G C{S'), where S' = o{S), that was not in any string of C{S). 
This can cause violation of an Invariance property, as in the counterexample in the proof 
of Corollary 2. Knowing that -17; is true in every c-state of every string of JC{S) provides no 
guarantee that -17; is true in every c-state of every string of C{S'). □ 

Since we already have results to cover the one-step operators, we need only consider the 
two-step operators. 

Corollary 5 The machine learning operator schemas Odeiete+gen, Ogpec+gen, o-'nd Ogtay o,re 
guaranteed to be SMLs with respect to any Invariance property P because for all of these 
operators y G and y L. 

Corollary 6 The machine learning operator schemas Odeiete+add, Ospec+add; Omove, md 
Ochange 0,1^^ not ncccssarily SMLs with respect to any Invariance property P because for 
all of these operators f G. 
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Figure 9: The automata SI (left) and SI' (right). 

The next theorem characterizes those learning operators that cannot be guaranteed to 
be SMLs with respect to Response properties. 

Theorem 4 Any machine learning operator schema that can introduce a new string with 
an infinitely repeating substring into the FSA language cannot he guaranteed to be an SML 
for Response properties. 

Proof. Assume FSA S satisfies a Response property prior to learning. Therefore every 
string accepted by S satisfies the property. For each accepted string, every instance (or 
the first instance if it is a First-Response property) of the trigger is eventually followed 
by a response. Suppose the machine learning operator introduces a new string with an 
infinitely repeating substring into the language. Then it is possible that the prefix of this 
string before the infinitely repeating substring includes a trigger and no response, and the 
infinitely repeating substring does not include a response. □ 

Since we already have results to cover the one-step operators, we need only consider the 
two-step operators. 

Corollary 7 All of the two-step learning operators cannot be guaranteed to be SMLs with 
respect to Response properties because they are in the first class in the partition related to 
this theorem, i.e.. they may introduce strings with infi,nitely repeating substrings. 

Consider a couple of illustrative examples of Theorem 4 and its corollary, using Figure 9. 
Prior to learning (the FSA on the left of Figure 9), Vx, where x G >C(S1), x |= P3, for 
Response property PS = □ (a ^ O d). Assume operator Ogtay'- SI SI' deletes edge 
(STATE2, STATES) and generaUzes the transition condition on edge (STATE2, STATE2) 
to "e V a" (see Figure 9 on the right). Then the string consisting of b followed by infinitely 
many a's (b,a,a,a,...) G >C(S1') but ^ PS. This helps us to see why Ogtay is not necessarily 
an SML for Response properties. The same example illustrates why Odeiete+gen cannot be 
guaranteed to be an SML for Response properties. For Ospec+gem suppose the condition for 
(STATE2, STATES) is "f V a" in SI, and "f a" in SI' but everything else is the same 
as in Figure 9. Again, we can see the problem for Response properties. 

We conclude by summarizing the positive a priori results: 

• Odelete, Ogpec OdeieteVspec and o delete- action are SMLs for any property (expressible in 
temporal logic). 
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• Odeiete+gen, Ogpec+gen and Ogtay are SMLs for Invariance properties, 
and the negative a priori results: 

• Oaddi Oggfi, OaddVgem Oadd—actiom Ogpec+addi Odelete+addi Omove and Ochange are not neces- 
sarily SMLs for Invariance or Response properties. 

• Odeiete+gem Ogpec+gen and Ogtay are not necessarily SMLs for Response properties. 

The fact that all three learning operators that satisfy the accessibility condition are 

guaranteed to be SMLs for Invariance properties is significant, because Invariance properties 
are extremely useful and common for verifying systems and many important applications 
need only test properties of this class (Heitmeyer et al., 1998). 

Finally, from Theorems 1 and 2 we learned that the heart of the problem for all of 
the negative results is either an o^en step or an Oadd step. Later in this paper we address 
these troublesome steps by finding more efficient methods for dealing with them than total 
reverification from scratch. However, first, in the next section, we consider how our a priori 
results are translated from a single to a product FSA for SITmultplans- 

5. Translating Learning Operators to a Product Automaton 

In this section we address SITmuitpians where each agent maintains and uses its own in- 
dividual FSA, but for verification the product FSA needs to be formed and verified. For 
SITmuitpians^ a learning operator is applied to an individual agent FSA and then the product 
is formed. Therefore, it is necessary to consider the translation of each learning operator 
from individual to product FSA, and how that affects the a priori SML results presented 
above. 

For operators Ogpec+gen, Odeiete+gen, Ogpec+add, and Odeiete+gen, wc Consider Only the trans- 
lations of the primitive operators. This is because the translations of these operators are 
simply translations of their primitive components. The remaining translations are: 

• Ospec translates to Ogpec and/or Odeiete- 

• Odeiete translates to Ospec and/or Odelete- 

• Ogen translates to Ogen and/or Oadd- 

• Oadd translates to Ogen and/or Oadd- 

• Ostay translates to Ogtay and/or Omove- 

• Omoiie translates to Omo^e- 

• Ochange translates to Ochange- 

It may not be intuitive to the reader how Ogen can translate to Oadd- To illustrate, we 
use Figure 10, where the transition conditions, such as (a V c), denote sets of multiagent 
actions. Suppose 

Ogen is applied to edge (1, 2) in the leftmost FSA so that the transition 
condition is now (d V b). Then a new edge (11', 21') is added to the product FSA (rightmost 
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a V c b V c c 




Figure 10: Generalization can become addition in product. 



in Figure 10) with the transition condition b. Recall that to form the product FSA we take 
the Cartesian product of the vertices and the intersection of the transition conditions. 
Likewise, o^pec translates to either Ogpec or Odeiete in the product FSA. 

To illustrate why Ogtay can become Omove in the product, we use Figure 3. Suppose 
we delete the edge (TRANSMITTING, RECEIVING) and move the transition condition 
to edge (TRANSMITTING, TRANSMITTING). Then the global state (DELIVERING, 
DELIVERING, TRANSMITTING) becomes accessible from initial state (COLLECTING, 
RECEIVING, TRANSMITTING) by taking multiagent action (F-deliver A Lreceive A L- 
transmit). Previously, that multiagent action forced the product FSA to go to (DELIVER- 
ING, DELIVERING, RECEIVING). 

What implications do these translations have for the safety of the learning operators 
for the product FSA? The positive a priori results for Odeiete+gem Ogpec+gem and Ogtay for 
preserving Invariance properties become negative for the product. This is because Ogen may 
become Oadd ^^nd Ostay may become Omove- On the other hand, the positive a priori resuhs 
for Odeiete, Ogpec, OdeieteVspec and Odeiete-acHon preserving all properties remain positive for 
the product. For Odeiete, o^pec, Odeietewspec, and Odeiete-action, this impHes that the product 
FSA never needs to be formed, reverification does not have to be done, and thus there is 
no run-time cost, even for multiple agents learning autonomously. As mentioned above, 
the troublesome parts of all operators are due to their Ogen or Oadd component. In the 
next section we develop methods for reducing the complexity of reverification over total 
reverification from scratch when these operators have been applied. 

6. Incremental Reverification 

Recall that operators Ospec and Odeiete cannot cause problems with the safety of learning, 
whereas Ogen and Oajd are risky (i.e., are not a priori guaranteed to be SMLs). Furthermore, 
Ogen and Oadd Can cause problems when they are the second step in a two-step operator. 
Fortunately, we have developed incremental reverification algorithms for these operators 
that can significantly decrease the time complexity over total reverification from scratch. 

Recall that there are two ways that operators can alter accessibility: globally (G) or 
locally (L). Furthermore, recall that Oadd can increase accessibihty either way (t G t L)- 
whereas Ogen can only increase accessibility locally (y G f L)- We say that Ogen has only 
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a "localized" effect on accessibility, whereas the effects of Oadd may ripple through many 
parts of the FSA. The implication is that we can have very efficient incremental methods 
for reverification tailored for Ogem whereas we cannot do likewise for Oadd- In other words, a 
more localized effect on accessibility implies that it is easier to localize reverification to gain 
speed. This is also true for both two-step operators that have o^en as their second step, i.e., 
Odeiete+gen ^nd Ogpec+gen are amenable to incremental (localized) reverification. Because 
no advantage is gained by considering Oadd per se, we develop incremental reverification 
algorithms for the most general operator Ochange- These algorithms apply to Oadd and all 
other special cases of Ochange- 

We have developed two types of incremental reverification algorithms: those that follow 
the application of Og^m and those that follow the application of Ochange- Foi^ each of our 
learning operators, one or more of these algorithms is applicable. Before presenting the 
incremental algorithms. Subsection 6.1 presents two algorithms for total reverification from 
scratch, namely, one for Invariance properties and the other for all properties expressible as 
FSAs, as well as an algorithm for taking the tensor product of the FSAs. These algorithms 
apply to SITiagent, SITipian, or SIT^uitpians- Subsection 6.2 gives incremental versions of 
all the algorithms in Subsection 6.1. These algorithms are applicable when the learning 
operator is ochange or any of its special cases. Furthermore, they apply to any of SITiagent- 
SITipian. or SITmuitpians- Subsection 6.3 has incremental algorithms for SITiagent and 
SITipian, learning operator Ogen, and Invariance and full Response properties in particular. 
The section concludes with theoretical and empirical results comparing the time complexity 
of the incremental algorithms with the time complexity of the corresponding total version 
(as well as with each other). 

The goal in developing all of the incremental reverification algorithms is maximal effi- 
ciency. These algorithms make the assumption that S \= P prior to learning, which means 
that any errors found on previous verification(s) have already been fixed. Then learning oc- 
curs (o(S) = S"), followed by the incremental reverification algorithm (see Figure 1). Next 
let us consider the soundness and completeness of the algorithms, where we assume normal 
termination. All of the incremental reverification algorithms presented here are sound (i.e., 
whenever they conclude that reverification succeeds, it is in fact true that S" |= P) for 
"downstream" properties and "directionless" properties for which the negation is express- 
ible as a Biichi FSA. Downstream properties (which include Response) check sequences of 
events in temporal order, e.g., whether every p is followed by a q. In contrast, "upstream" 
properties check for events in reverse temporal order, e.g., whether every q is preceded by 
a p}^ Directionless properties, such as Invariance, impose no order for checking. Some of 
the incremental algorithms are also complete, i.e., whenever they conclude that reverifica- 
tion fails, it is in fact true that S' ^ P. (The reader should avoid confusing "complete 
algorithm" with "complete FSA.") 

When reverification fails, it does so because of one or more errors, where an "error" 
implies there is a property violation (S' ^ P). There are two ways to resolve such errors. 
Either return to the prelearning FSA(s) and choose another learning operator and reverify 
again, or keep the results of learning but repair the FSA(s) in some other way to fix the 
error. With one exception, the complete algorithms in this section find all true errors 

10. William Spears (personal communication) identified the upstream/downstream distinction as being rel- 
evant to the applicability of the incremental algorithms described here. 
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Table 1: The transition function for agent L's FSA plan. The rows correspond to states 
and the columns correspond to multiagent actions. 



introduced by learning. The algorithms that are not complete may also find false errors. 
Any algorithm that finds all and only true errors can resolve these errors in either of the two 
ways. An algorithm that does not find all errors or finds false ones requires more restricted 
error resolution. In particular, it can only be used with the first method for resolving errors, 
which consists of choosing another learning operator. The algorithms that are sound but not 
complete (can find false errors) are overly cautious. In other words, they may recommend 
avoiding a learning operator when in fact the operator may be safe to apply. 

Before presenting the incremental algorithms, we first present algorithms for total rever- 
ification from scratch. These algorithms do not assume that learning has occurred, and they 
apply to all situations. They are more general (not tailored for learning), but less efficient, 
than our incremental algorithms. 

6.1 Product and Total (Re) verification Algorithms for All Situations 

For implementation efficiency, all of our algorithms assume that FSAs are represented using 
a table of the transition function S{vi,a) = V2, which means that for state ui, taking action 
a leads to next state V2, as shown in Table 1. Rows correspond to states and columns 
correspond to multiagent actions. This representation is equivalent to the more visually 
intuitive representation of Figures 2 and 3. In particular. Table 1 is equivalent to the FSA 
in Figure 3 for the lander agent L. In Table 1, states are abbreviated by their first letter, 
and the multiagent actions are abbreviated by their first letters. For example, "crt" means 
agent F takes action (F-coUect), I takes (I-receive), and L takes (L-transmit). The table 
consists of entries for the next state, i.e., it corresponds to the transition function. A "0" in 
the table means that there is no possible transition for this state-action pair. One situation 
in which this occurs is when an action is not allowed from a state. Consider an example 
use of the table format for finite-state automata. According to the first (upper leftmost) 
entry in Table 1, if L is in state TRANSMITTING ("T") and F takes action F-collect, I 
takes I-receive, and L takes L-transmit (which together is multiagent action "crt"), then 
L will transition to its RECEIVING ("R") state, i.e., S{T, crt) = R. With this tabular 
representation, Ochange is implemented as a perturbation (mutation) operator that changes 
a table entry to another randomly chosen value for the next state. Operator o^en is a 
perturbation operator that changes a entry to a next state already appearing in that row. 
For example, generalizing the transition condition along edge (T,R) can be accomplished 
by changing one of the Os to an R in the first row of Table 1. This is because the transition 
condition associated with edge (T,R) is the set of all multiagent actions that transition from 
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Suppose there are n agents, and 1 < jit < the number of states in the FSA for agent k. 
Then the algorithm forms all product states v = {vj^, ••,i'j„) and specifies their transitions: 

Procedure product 

for each product state v = {vj^, ..,Vj^) do 

if all Vjj^, 1 < < n, are initial states, then v \s a product initial state 
endif 

for each multiagent action do 

if {6{vji^,ai) == 0) for some A;, 1 < A- < n, then 6{v,ai) = 

else S{v, flj) = {S{vj^, a,) endif 

endfor 
endfor 

end procedure 

Figure 11: Totalp^od product algorithm. 

T to R, i.e., {crt, drt} in Table 1. This set is expressed in Boolean algebra as (I-receive A 
L-transmit) (see Figure 3). 
For 

an SIT^y^iipi^^g^ prior to verification the multiagent product FSA S needs to 
be formed from the individual agent FSAs (see Figure 1). We can implement the algorithm 
Totalprod for generating the product FSA using the data structure of Table 1 as shown in 
Figure 11. In the product FSA, an example product state and transition is (5(CRT, drt) 
= DDR because (5(C, drt) = D, (5(R, drt) = D, and (5(T, drt) = R for agents F, I, and L, 
respectively. The initial states of the product FSA are formed by testing whether every 
individual state of the product is an initial state. For example, if D, D, and R are initial 
states for F, I, and L, respectively, then DDR will be an initial state for F (g) I (g) L. After 
forming the product states and specifying which are initial, the algorithm of Figure 11 
specifies the 5 transition for every product state and multiagent action. 

Note that the algorithm in Figure 11 forms the product FSA S for testing Invariance 
properties. To test First-Response properties using AT verification, we need to form the 
product FSA 5" ® -iP. To do this simply requires considering -iP to be the (n + l)st agent. 
The algorithm in Figure 11 is modified by changing n to n + 1 everywhere. It is also 
important to note that in all situations (including SITiagent)-, Totalprod must be executed 
to form the product S ® -iP if AT verification is to be done. In SITiagent, S is just the 
single agent FSA and n is 1. For SITipian, n = 1 also. In other words, for SITipi^n the 
multiagent plan, once formed, is never subdivided and therefore it could be considered like 
a single agent plan. In both of these cases, if AT verification is done the product is taken 
of the single plan FSA and the property FSA. 

Given that the product FSA has been formed if needed, then the final (multi)agent FSA 
can be verified. We first consider a very simple model checking algorithm, called Totalj, 
tailored specifically for verifying Invariance properties of the form D-i^. The algorithm, 
shown in Figure 12, consists of a depth-first search of S beginning in each initial state. Any 
accessible atom that is part of a transition condition, where ^ p, violates the property. 
(We store the set of all atoms Oi ^ p for rapid access.) 
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Procedure verify 

for each state v G V(S) do 

visited(i;) = 
endfor 

for each initial state v G I{S) do 

if (visited(i;) == 0) then dfs(i;); endif 

endfor 
end procedure 
Procedure dfs(u) 

visited(w) = 1; 

for each atom G r(/C), a, ^ p, do 

if S{v,ai) ^ then print "Verification error"; endif 

endfor 

for each atom G r(/C) set w = 6(v,ai) and do 

if (w ^ 0) and (visited(w) == 0) then dfs(w); endif 

endfor 
end procedure 

Figure 12: Totalj verification algorithm. 

Next we consider an algorithm for verifying any property whose negation is expressible 
as a Biichi FSA, including First-Response properties. The reader may wish to review the 
high-level description of this AT model checking algorithm presented in Subsection 3.3 before 
continuing. Figure 13 gives a basic version of this algorithm from Courcoubetis et al. (1992) 
and Holzmann et al. (1996).^^ We call this algorithm TotalAT because it is total automata- 
theoretic verification. Recall that in AT model checking, the property is represented as an 
FSA, and asking whether 5 |= P is equivalent to asking whether C{S) C C{P) for property 
P. This is equivalent to C{S) fl C{P) = 0, which is algorithmically tested by taking the 
tensor product of the plan FSA and the FSA corresponding to -iP. If C{S ® -iP) = then 
C{S) C C{P), i.e., S \= P and verification succeeds; otherwise, S ^ P and verification fails. 
The algorithm of Figure 13 assumes that the negation of the property (-■P) is expressed as 
a Biichi automaton and the FSA being searched is 5 ® -iP. 

Algorithm TotalAT, in Figure 13, actually checks whether 5 ^ P for any property P. 
To check if 5" ^ P, we can determine whether £{S ® -iP) / 0. This is true if there is some 
"bad" state in D{S -iP) reachable from an initial state and reachable from itself, i.e., 
part of an accessible cycle and therefore visited infinitely often. The algorithm of Figure 13 
performs this check using a nested depth-first search on the product FSA S ® -iP. The first 
depth-first search begins at initial states and visits all accessible states. Whenever a state 
s E D{S ® -iP) is discovered, it is called a "seed," and a nested search begins to look for a 
cycle that returns to the seed. If there is a cycle, this implies the P(5 -'P) (seed) state 
can be visited infinitely often, and therefore the language is nonempty (i.e., there is some 
action sequence in the plan that does not satisfy the property) and verification fails. 

11. This algorithm is used in the well-known Spin system (Holzmann, 1991). A modification was made to 
the published algorithm for readability, as well as for efficiency, for the case where it is desirable to halt 
after the first verification error. This modification makes the nested call first in procedure dfs. 
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Procedure verify 

for each state v G V{S -^P) do 

visited(i;) = 
endfor 

for each initial state v G I{S ® -iP) do 

if (visited(i;) == 0) then dfs(i;); endif 

endfor 
end procedure 
Procedure dih{v) 

visited(i;) = 1; 

if i; G B{S ® -P) then 
seed = v; 

for each state v G V{S » -.P) do 

visited2(i;) = 
endfor 
ndfs(i;) 

endif 

for each successor (i.e., next state) w of w do 
if (visited(w) == 0) then dfs(w); endif 

endfor 
end procedure 

Procedure ndfs(t') /* the nested search */ 

visited2(t') = 1; 

for each successor (i.e., next state) w of u do 

if {w == seed) then print "Bad cycle. Verification error"; 
break 

else if (visited2(w) == 0) then ndfs(w); endif 
endif 
endfor 
end procedure 

Figure 13: Total at verification algorithm. 
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Suppose there are n agents, and agent i was modified, 1 < i < n. 

Operator Ochange modified 8{vi, aadapt) to be w,' for state Vi and multiagent action aadapt- 
1 < ifc < the number of states in the FSA for agent k. 
Then the algorithm is: 

Procedure product 

for each product state v = {vj^, ...,Vi, ...,Vj^) formed from state Vi do 

if {6{vji^, aadapt) == 0) for some fc, 1 < fc < n, then 6(v, aadapt) = 
else S{v, aadapt) = [Sivj-^, aadapt), Wi , 6{vj^: aadapt)); endif 

endfor 

end procedure 

Figure 14: InCprod product algorithm. 

Totalj and Total at are sound and complete (for any property whose negation is express- 
ible as a Biichi FSA), and they find all verification errors. Before elaborating on this, first 
note that the term "verification error" has a difi^erent connotation for Totalj and Total at- 
For Total I an error is an accessible bad atom (i.e., an atom a < p where the property is 
□ -If}). For Total AT it is an accessible bad state that is part of a cycle. The reason Total i 
is sound is that it flags as errors only those atoms a < p. It is complete and finds all errors 
because it does exhaustive search and testing of all accessible atoms. Total at is also sound 
and complete, for analogous reasons. Because Totalj and TotalAT find all errors, they can 
be used with either method of error resolution (i.e., choose another operator or fix the FSA). 

6.2 Incremental Algorithms for Ochange and All Situations 

All of the algorithms in the previous subsection can be streamlined given that it is known 
that a learning operator (in this case, Ochange) has been applied. For simplicity, all of our 
algorithms assume ochange is applied to a single atom (multiagent action). For example, 
we assume that if S{vi, aadapt) = m, then Ochange{S{vi, aadapt)) = m' where Wi and Wi' are 
states (or 0, implying no next state), and aadapt is a multiagent action. Since we use the 
tabular representation, this translates to changing one table entry. 

Figure 14 shows an incremental version oiTotalprod-, called InCpj-od-, which is tailored for 
re-forming the product FSA after Ochange has been applied. The algorithm of Figure 14 is 
for Invariance properties; for AT verification change n to n + 1 in the algorithm and assume 
-iP is the (n + l)st agent. Although Incprod is applicable in all situations when taking the 
product with the property FSA, the primary motivation for developing this algorithm was 
the multiagent SITmultplans- Recall that in this situation, every time learning is applied to 
an individual agent FSA, the product must be re-formed to verify global properties. The 
wasted cost of doing this motivated the development of this algorithm. 

Algorithm InCprod assumes that the product was formed originally (before learning) 
using Totalprod- InCp^od capitalizes on the knowledge of what (individual or multiagent) 
state [vi) and multiagent action [aadapt) transition to a new next state as specified by 
operator Ochange- This algorithm assumes that the prelearning product FSA is stored. Then 
the only product FSA states whose next state needs to be modified are those states that 
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Procedure product 

I{S) = 0; 

for each product state v = {vj.^, ...,Vi, ...,Vj_^) formed from state Vi do 
if visited(w) then I(S) = 1(3) U {v}; endif 

if {S{vjj,,aadapt) == 0) for some k, 1 <k <n, then S{v,aadapt) = 

else S{v. Gadapt) = {S{vj^. aadapt) Wj' ^ivj^. dadapt)); endif 

endfor 

end procedure 

Figure 15: InCprod-Ni product algorithm: a variation oi InCprod that gets new initial states. 



include Vi and transition on action aadapt- The method for reverification that is assumed to 
follow InCprod is total reverification, i.e., Total j or Total at- 

Next, consider another pair of product and reverification algorithms that is expected 
to be, overall, potentially even more efficient. The goal is to streamline reverification after 
Ochange- This requires a few simple changes to the algorithms. The motivation for these 
changes is that when model checking downstream properties, ochange h^is only "downstream 
effects," i.e., it only affects the accessibility of vertices and atoms altered by Ochange or those 
that would be visited by verification after those altered by Ochange- 

Consider the changes. We start by building a set of the Cartesian product states v = 
{vj-^^, ...,Vi, ...,Vj^) that are formed from the state that was aifected by learning. The 
first way that we can shorten reverification is by using these states as the new initial states 
for reverification. In fact, we need only select those that were visited during the original 
verification (i.e., are accessible from the original initial states). In other words, suppose 
for agent i, ochange modified 5[vi^ dadapt)- Then we reinitialize the set of initial states to be 
and add all product states formed from Vi that were marked "visited" during previous 
verification. This can be done by modifying the product algorithm of Figure 14 as shown in 
Figure 15. The algorithm of Figure 15 is to form the product FSA for verifying Invariance 
properties. To form the product for AT verification, substitute I{S ® -iP) for I{S) and 
(n + 1) for n in Figure 15. We call this incremental product algorithm InCpj-od-Ni, where 
"NI" denotes the fact that we are getting new initial states. 

The second way to streamline reverification is by only considering a transition on ac- 
tion aadapt ^ the action whose 6 value was modified by learning, from these new initial 
states. Thereafter, incremental reverification proceeds exactly like total (re)verification. 
With these changes, Totalj becomes Incj^j^j, shown in Figure 16. Likewise, with these 
changes Total at becomes Ihcat-ni, as shown in Figure 17. Figure 17 shows only changes 
to procedures dfs and verify; ndfs is the same as in Figure 13. One final streamlining added 
to Incj-Nj, but not Iucat-ni, is that only the new initial states have "visited" reinitialized 
to 0. This can be done for Invariance properties because they are not concerned with the 
order of atoms in strings. 

12. Suppose Ochange adds a new edge {vi,V3). If 113 was visited on previous verification of an Invariance 
property (from a state other than vi), then all atoms that can be visited after vs would already have 
been tested for the property. On the other hand, when testing First-Response properties the order of 
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Procedure verify 

for each new initial state v G I(S) do 

visited(i;) = 
endfor 

for each new initial state v G I{S) do 

if (visited(i;) == 0) then dfs(i;); endif 

endfor 
end procedure 
Procedure dfs(i') 

visited(w) = 1; 

\f V G I{S) and w ^ 0, where w = S{v,aadapt), then 

if {(ladapt d p) then print "Verification error"; endif 
if (visited(w;) == 0) then dfs(w); endif 

else 

for each atom a,; e r(/C), ai ^ p, do 

if &{v^ai) ^ then print "Verification error"; endif 

endfor 

for each atom a, G r(/C) set w = 6{v,ai) and do 

if {w ^ 0) and (visited(u;) == 0) then dfs(w;); endif 

endfor 

endif 
end procedure 

Figure 16: Incj-Ni reverification algorithm. 

Incj^^^j is sound for Invariance properties, and IncAT^^rj; is sound for any downstream 
or directionless property whose negation is expressible as a Biichi FSA, including First- 
Response and Invariance. Assuming S \= P prior to applying Ochange to form S", if these 
incremental reverification algorithms conclude that S' \= P, then total reverification would 
also conclude that S' \= P. Recall that total reverification is sound. Therefore, the same 
is true for these incremental algorithms. Furthermore, these incremental reverification al- 
gorithms will find all of the new violations of the property introduced by Ochange- The 
reason the algorithms are sound and find all new errors (for downstream or directionless 
properties) is that there are only two ways that accessibility can be modified by any of our 
learning operators, including ochange'- locally or globally. Recall that local change alters the 
accessibility of atom Uadapt or the state 5{vi,aadapt)i and a global change alters the acces- 
sibility of states or atoms that would be visited after S{vi,aadapt)- In neither case (local 
or global) will the learning operator modify accessibility of atoms or states visited before, 
but not after, aadapt- Our algorithms reverify exhaustively (i.e., they reverify as much as 
total reverification would) for all atoms and states visited at or after aadapt- Since these 
incremental algorithms perform reverification exactly the same way as their total versions 



atoms is relevant. Even if 113 was previously visited, since it might not have been visited from v\, the 
addition of {v\, v^) could add a new string with a new atom order that might violate the First-Response 
property. Therefore, 113 needs to be revisited for First-Response properties, but not for Invariance 
properties. 
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Procedure verify 

for each state v G V{S ^P) do 

visited(i;) = 
endfor 

for each new initial state v G I{S ® -iP) do 
if (visited(i;) == 0) then dfs(i;); endif 

endfor 
end procedure 
Procedure dfs(u) 

visited(w) = 1; 

if i; G b[s -.P) then 
seed = v\ 

for each state v G V{S ® ^P) do 

visited2(i;) = 
endfor 
ndfs(i;) 

endif 

if t' e I{S « -iP) and w ^ and (visited(w) == 0), 

where w = 6{v,aadapt). then dfs(w) 

else 

for each successor (i.e., next state) iv of v do 
if (visited(w;) == 0) then dfs(w); endif 
endfor 

endif 
end procedure 

Figure 17: Procedures verify and dfs of the Iucat-ni reverification algorithm. 

do after the part of the FSA that was modified by learning, they will find all new errors 
introduced by learning. 

Incj-Ni is complete for Invariance properties because it flags errors using the same 
method as Total j, and because Invariance properties are directionless and are therefore 
impervious to the location of atoms in a string. On the other hand, Iucat-ni is not 
complete for all downstream properties. For example, it is not complete for properties 
that check for the first occurrence of a pattern in a string, e.g., First-Response properties. 
Because Iucat-ni does not identify whether the new initial states are before or after the 
first occurrence, there is no way to know if the first occurrence is being checked after learning. 
Nevertheless, this lack of completeness for First-Response properties actually turns out to 
be a very useful trait, as we will discover in Subsection 6.5. 

6.3 Incremental Algorithms for o^en and SITiagent/ipian 

We next present our final two incremental reverification algorithms, which are applicable 
only in SITiagent and SITipiam when there is one FSA to reverify. These are powerful 
algorithms in terms of their capability to reduce the complexity of reverification. However, 
their soundness relies on the assumption that the learning operator's effect on accessibility 
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Procedure check-invariance-property 

if vi was not previously visited, then output "Verification succeeds." 
else 

if (2; 1= -ip) then output "Verification succeeds." 
else output "Avoid this instance of Oge„."; endif 

endif 

end procedure 

Figure 18: InCgen-i reverification algorithm. 

is localized, i.e., that it is Ogen with SITiagent or SITipian but not SITmultplans (where 
Ogen might become Oadd)- An important advantage of these algorithms is that they never 
require forming a product FSA, not even S ^P, regardless of whether the property is 
type Response. The algorithms gain efficiency by being both tailored to a specific property 
type and to a specific learning operator. The objective in developing these algorithms was 
maximal efficiency and therefore they sacrifice completeness and/or the ability to find all 
errors. 

These two incremental algorithms are tailored for reverification after operator o^en- 
Assume that property P holds for S prior to learning, i.e., S |= P. Now we generalize the 
transition condition Mic{vi,vs) = y to form S' via Ogen (Mtcivi^vs)) = yV z, where y /\ z 
= 0. We want to verify that S' ^ P. 

One additional definition is needed before presenting our algorithms. We previously 
defined what it means for a c-state formula p to be true at a c-state, but to simplify 
the algorithms we also define what it means for a c-state formula to be true of a transition 
condition. A c-state formula p is defined to be true of a transition condition y, i.e., "y |= 
if and only ii y ^ p (which can be implemented by testing whether for every atom a < y, 
a < p.) 

Let us begin with the algorithm Incgen-i (which consists of two very simple tests) 
tailored for o^en and Invariance properties, shown in Figure 18. Recall that Mx;(ui, U3) = y 
and Ogen(M!c{vi,vs)) = yV z. Incgen-i-, which tests "z |= -ip," localizes reverification to a 
restricted portion of the FSA. (For efficiency, 2; |= -ip is implemented as a test fox z < p 
rather than z < ^p because p is typically expected to be more succinct than -ip.) Assume 
the Invariance property is P = D-ip and S \= P. Then every string x in C{S) satisfies 
Invariance property P, so for each x, -17; is true of every atom in x. This implies y \= ^p. 
This statement is based on our assumption that vi is accessible from an initial state. If not, 
reverification is not needed. The generalization will not violate P. Therefore, the algorithm 
begins by testing whether vi was visited on previous verification. If not, the output is 
"success." (Note that Ogen does not alter the accessibility of vi.) 

Incgpn-i is sound and complete for Invariance properties. Generalization of Mfc,{vi,vs) 
is application of Ogen (M/civi, vz)) = y\l z to form S' . This operator o^en preserves Invariance 
property P if and only if S' |= P, which is true if and only if 2; |= -^p. The reason for this 
is that we know S satisfies P from our original verification, and therefore -17; is true for all 
atoms in all strings in £(5*). The only possible new atoms in C[S') but not in ll[S) are 
in z. If z 1= -ip, then -^p is true for all atoms in C{S'), which implies that every string in 
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Procedure check-response-property 
\f y \= q then 

if {z 1= q and z \= -ip) then output "Verification succeeds." 
else output "Avoid this instance of Ogen" \ endif 

else 

if [z 1= -ip) then output "Verification succeeds." 
else output "Avoid this instance of Ogen" \ endif 

endif 

end procedure 

Figure 19: Incgen-R reverification algorithm. 

C{S') satisfies P. In other words, S' \= P. Therefore, Incgen-i is sound. We also know 
that it is complete because if 3a, a ^ z, a p, then it must be the case that 5" ^ P. 
In conclusion, InCgen-i, which consists of the test "2; |= -ip," is sound and complete. For 
maximal efficiency, our implementation of Incgen-i halts after the first error, although it 
is simple to modify it to find all errors (and this does not significantly affect the empirical 
time complexity results of Subsection 6.5, nor does it affect the worst-case time complexity). 
Incgen-i is incremental because it is localized to just checking whether the property holds 
of the newly added atoms in z, rather than all atoms in £(S"). Finally, this algorithm only 
needs to be executed for Ogen, but not for Ogpec+gen or Odeiete+gen, because Ogen is the only 
version that can add new atoms via generalization. Recall that Ogpg^^ggn and 0(^e/ete+gen '^^^ 
SMLs for Invariance properties. 

As an example of Incgen-i, suppose a, b, c, d, and e are atoms, and the transition 
condition y between STATEl and STATE2 equals a. Let (a, b, b, d, d,...), where the 
ellipsis indicates infinite repetition of d, be a string in C{S) that includes STATEl and 
STATE2 as the first two vertices in its accepting run. The property is P = □-■ e. Assume 
the fact that this string satisfies e was proved in the original verification. Suppose Ogg^ 
generalizes M/c (STATEl, STATE2) from a to (a V c) (i.e., it adds a new allowable action 
c from STATEl), which adds the string (c, b, b, d, d,...) to C{S'). Then rather than test 
whether all of the elements of { a, b, c, d } are ^ e, we really only need to test whether 
c ^ -1 e, because c is the only newly added atom. 

The next algorithm, InCgen-R, is for generalization and full Response properties (and is 
nothing more than some simple tests). Like InCgen-h I'^-Cgen-R localizes reverification to a 
restricted portion of the FSA. Assume the Response property is P = □(p — >■ Oq), where p 
is the trigger and q is the response for c-state formulae p and q. Assume property P holds 
for S prior to learning {S \= P). Now we generalize Mjc{vi,vz) = y to form S' by applying 
Ogen {Mjcivi^v^)) = y\/ z, where y A 2; = 0. We need to verify that S' \= P. 

Incgen-R for Ogen ^nd full Rcspousc properties is in Figure 19. {Incgen-R is also appli- 
cable for Odeiete+gen and Ospec+gen-) The algorithm first checks whether a response could be 
required of the transition condition Mjcivi^vs). A response is required if, for at least one 
string in C{S) whose run includes (^1,^3), the prefix of this string before visiting vertex 
vi includes the trigger p not followed by response q, and the string suffix after U3 does not 
include q either. Such a string satisfies the property if and only if y \= q. Thus if y \= q 
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and the property is true prior to learning (i.e., for S), then it is possible that a response is 
required. In this situation (i.e., y \= q), the only way to be sure we are safe (S" |= P) is if 
the newly added condition z also has the response, i.e., z |= q. If not, then there could be 
new strings in JC,{S') whose accepting runs include (ui,U3) but do not satisfy the property. 
For example, suppose a, b, c, and d are atoms, and the transition condition y between 
STATE4 and STATE5 equals d. Let x = (a, b, b, d, ...) be a string in C{S) that includes 
STATE4 and STATES as the fourth and fifth vertices in its accepting run. The property is 
P = □ (a ^ <> d), and therefore y \= q and x |= P. Suppose o^en generalizes Mx;(STATE4, 
STATES) from d to (d V c), where z is c, which adds the string x' = (a, b, b, c, ...) to 
C{S'). Then z ^ q. If the string suffix after (a, b, b, c) does not include d, then there is 
now a string that includes the trigger but does not include the response. In other words, 
x' ^ P. Finally, if y \= q and z |= q, an extra check is made to be sure z \= -tp because an 
atom could be both a response and a trigger. New triggers should be avoided. 

The second part of the algorithm states that ify ^ q and no new triggers are introduced 
by generalization, then the operator is "safe" to do. It is guaranteed to be safe (S" |= P) 
in this case because ii y ^ q, then a response cannot be required here. In other words, 
because S \= P, for every string in JC{S) whose accepting run includes (ui,U3), either no 
trigger occurred prior to visiting ui, or every trigger was followed by a response prior to 
visiting ui, or a response occurred after visiting V3. 

Incgen-R is sound but not complete for full Response properties. Its soundness is based 
on the fact that Ogen does not increase accessibility of vertices or atoms visited after state 
U3 (i.e., globally) and therefore reverification can be localized to only Mx;(ui,U3). Incg^n-R 
is not complete because it may output "Avoid this instance of o^en" when in fact Ogen 
is safe to do. For example, ii y \= q but z \^ q, the algorithm will output "Avoid this 
instance of Ogen-" Yet it may be the case that S" |= P if no trigger p precedes response q in 
jO{S'), or if a response is after U3. When Incgen-R outputs verification failure, it does not 
supply sufficient information for FSA repair. Errors must be resolved by selecting another 
learning operator. Note that "error" has a different connotation for InCgen-R than for the 
AT verification algorithms. Any "Avoid..." output is considered an error. 

Another disadvantage of InCgen-R is that it does not allow generalizations that add 
triggers. If it is desirable to add new triggers during generalization, then one needs to modify 
InCgen-R to Call IncAT when reverification with InCgen-R fails, instead of outputting "Avoid 
this instance of Ogen" This modification also fixes the false error problem, and preserves 
the enormous time savings (see Section 6.S) when reverification succeeds. 

6.4 Theoretical Worst-Case Time Complexity Analysis 

Recall that one of our primary objectives is timely agent responses. This section compares 
the worst-case time complexity of the algorithms. Let us begin with the time complexity 
of Totalprod- This is 0{{Ut=i \y{Si)\) * |r(/C)| * n) to form the product of the individual 
agent FSAs for Invariance property verification, and 0((n"=i |^('5i)|) * \P\ * |r(/C)| * n) to 
form the product for AT verification. Here n is the number of agents, |y(S'j)| is the number 
of states in single agent FSA Si, \P\ is the number of states in the property FSA P, and 
|r(/C)| is the total number of atoms (multiagent actions). The reason for this complexity 
result is that there are HiLi l^('S'i)l product states for Invariance property verification, and 
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(n"=i l^('S'j)l) * product states for AT verification. The outer loop of Total p^-od iterates 
through all product states. The inner loop of Total p.,. od iterates through all |r(/C)| atoms. 
Note that (nLi l^('5i)|)*|r(/C)| and (nLi \V{Si)\)*\P\*\T{)C)\ are the sizes of the product 
FSA transition function tables built for Totalj and TotalAT-, respectively. Totalp^od does 
at most n lookups for each table entry. 

By comparison, our incremental algorithm InCprod for generating the product FSA has 
time complexity 0((ni=| * n) or 0{{XYCi * \P\*n) to modify the product 

FSA for Invariance property reverification or AT reverification, respectively. This is because 
the total number of revised product states is {U^^i \V{Si)\) or (nr=i^ l^('S'i)l) * |-P|, and 
only one atom is considered (because we assume Ochange changes the next state for a single 
atom ttadapt)- The time complexity of InCprod-Ni is the same as that of InCprod- 

Next consider the worst-case time complexity of total (re) verification after the product 
has been formed. It is 0((niLi l^('S'i)l) * |r(/C)|) for Totalj. This is because, in the 
worst case, every product state is accessible and therefore every entry in the product FSA 
transition function table is visited. Assuming |-B| is the number of "bad" (in the Biichi sense) 
states in the product FSA, then the worst-case time complexity oi Total at is 0((|-B| + 1) * 
(HiLi l^('S'i)l) * * |r(/C)|). This is because, in the worst case, every entry in the product 
FSA transition function table is visited once on the depth-first search and, for each bad 
state, again on the nested depth-first search. Unfortunately, the worst-case time complexity 
of /nc/_jv/ and Iucat-ni are the same as that oiTotalj and Total at^ respectively. This 
is because, in the worst case, every product state is still accessible. The restriction to 
transition only on a^dapt at first does not reduce the "big O" complexity. 

Finally, we consider the worst-case complexity of Incgen-i and Incgen-R- First, we 
define for any Boolean expression x, \x\ is the number of elements in {a | a G r(/C) and a ^ 
x}. For Invariance properties P of the form n-ip, \P\ equals \p\ since we test for each atom a 
whether a \= p rather than a |= -ip, because we expect \p\ < \^p\ in general. Then InCgen-i 
requires time 0{\z\ * \p\) to determine whether z |= -ip. (Checking whether vi was visited 
requires constant time.) Assuming \p\ < IliLi l^('S'i)| (which should be true except under 
bizarre circumstances), and since \z\ < r(/C), InCgen-i saves time over Totalj. InCgen-R 
requires time 0((|y| * \q\) + (l^l * {\p\ + 1^1))) to determine whether y |= q, and then to 
determine whether z \= q and z \= -tp}^ Clearly (|y| + |x;|) < |r(/C)| because by the definition 
of Ogen, yAz = 0. Therefore, assuming (|^»| + \q\) < {{\D\ + 1) * Iflf^i \V{Si)\ * |P|) (which, 
again, should be true except in bizarre circumstances), the worst-case time complexity of 
InCggji — R is lower than that of TotalAT- 

6.5 Empirical Time Complexity Comparisons 

Worst-case time complexity is not always a useful measure. Therefore we supplement the 
worst-case analyses with empirical results on cpu time. Our primary objective in these 
experiments is to compare the incremental algorithms with total reverification, as well as 
with each other, for the context of evolving behaviorally correct FSAs. The time required 



13. Determining whether z \= can be done by determining whether z p. Also, for InCgen-R, an 
additional time 0(|r(A^)| — {\y\ + \z\)) is needed to identify y when using the representation of Table 1. 
This does not affect our complexity comparisons or conclusions. 
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for reverification is significant to address if we want timely agent responses, because rever- 
ification occurs after every learning operator application. 

Before describing the experimental results, let us consider the experimental methodol- 
ogy. All code was written in C and run on a Sun Ultra 10 workstation. In our experiments, 
FSAs were randomly initialized, subject to certain restrictions. The reason for randomness 
is that this is a typical way to initialize individuals in a population for an evolutionary 
algorithm. There are two restrictions on the FSAs. First, although determinism and com- 
pleteness of FSAs are execution, rather than verification, issues and therefore need not 
be enforced for these experiments, our choice of tabular representation of the FSAs (see 
Table 1) restricts the FSAs to being deterministic. Second, because the incremental algo- 
rithms assume S \= P prior to learning, we restrict the FSAs to comply with this. There 
are two alternative methods for enforcing this in the experiments: (1) use sparse FSAs (i.e., 
with many Os) and keep generating new FSAs until total verification succeeds (which does 
not take long with sparse FSAs), or (2) use dense FSAs engineered to guarantee property 
satisfaction. In particular, dense FSAs are forced to satisfy Invariance properties D-ip by 
inserting Os in every column of the transition function table (such as Table 1) labeled with 
an atom a < p. Dense FSAs are forced to satisfy First-Response properties with trigger p 
and response q by inserting Os in every column labeled with an atom a ^ p. This eliminates 
triggers initially. Note that either of these methods is a viable way to initialize a population 
of FSAs for evolution because it ensures early success in satisfying the property. This paper 
presents only the results with dense FSAs. See Gordon (1999) for the results with sparse 
FSAs.14 

Another experimental design decision was to show scaleup in the size of the FSAs. 
Throughout the experiments there were assumed to be three agents, each with the same 12 
multiagent actions. Each individual agent FSA had 25 or 45 states. With 45 states the 
transition table contains 45^ * 12 entries. 

A suite of five Invariance and five Response properties was used, which is in Appendix 
C. Invariance properties were expressed by storing the set of all atoms a ^ p for property 
□ -1^. This suffices for all of our algorithms tailored for Invariance properties. For AT 
verification. Response properties were expressed with a First-Response Biichi FSA for the 
negation of the property. An explanation of why this is adequate for our experiments is 
below. For Incgen-R, trigger p, and response q, all atoms ai ^ p and aj ^ q were stored. Six 
independent experiments were performed to verify each of the properties. In other words, 
every reverification algorithm was tested with 30 runs - six runs for each of five Invariance 
or five Response properties. For every one of these runs, a different random seed was used 
for generating the three FSAs. However, it is important to point out that all algorithms 
being compared with each other saw the same FSAs. For example, in Table 2 we compare 
InCprod (row 1), InCprod-Ni (row 4), and Totalprod (row 7). They all input the same three 
FSAs. Furthermore, the learning operator (specific instantiation of the operator schema) 
was the same for all algorithms being compared. 

14. Sparse FSAs have an additional advantage, assuming they remain relatively sparse after evolution. The 
advantage is their succinctness for efficient execution, as in multientity models (Tennenholtz & Moses, 
1989). 

15. The sparse FSAs had 25, 45, or 65 states. To get accurate timing results with the dense FSAs, though, 
65 states required a cpu free of any interfering processes for an unreasonably long time. 
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Let us consider the results in Tables 2 and 3. In both of these tables, each row corre- 
sponds to an algorithm. Rows are numbered for later reference. The entries give perfor- 
mance results, to be described shortly. Table 2 compares the performance of total rever- 
ification with the algorithms of Subsection 6.2, which were designed for Ochange ^ind all 
situations. The situation assumed for these experiments was SITmuitpians- Three dense 
random (subject to the above-mentioned restrictions) FSAs were generated, and then the 
product was formed. The result was a product FSA satisfying the property. Operator 
Ochange was then applied, which consisted of a random (but points to a state instead of 0) 
change to a randomly chosen table entry in the FSA transition table for a random choice 
of one of the three agents. Finally, the product FSA was re-formed and reverification done. 

The methodology for generating Table 3 was similar to that for Table 2, except that 
Ogen was the learning operator and the situation was assumed to be SITipian- In other 
words, the product FSA was formed, and then o^en applied to the product FSA of the three 
agents, the product was taken with the property FSA if needed for AT verification, and 
then reverification performed. Operator Ogen consisted of choosing a random state Sj and 
a random action for which 5{si,ai) = s^, and choosing a random action aj for which 
S{si,aj) = 0, and then setting S{si,aj) = si~. 

Any column in Tables 2 or 3 labeled "sec" gives a mean, over 30 runs, of the cpu time 
of the algorithm. Columns labeled "spd" give the speedup over total, i.e., the cpu time of 
the incremental algorithm in that row divided by the cpu time of the corresponding total 
algorithm. For example, the "spd" entry for Incprod in row 1 gives its cpu time divided 
by the cpu time of Totalprod in row 7. Columns labeled "err" show the average number 
of verification errors over 30 runs. This is important to monitor because, for example, the 
cpu time is most strongly correlated with the number of states "visited" during dfs, and 
"visited2" during ndfs when AT verification is used. Every property error causes ndfs to be 
called with a nested search, which may be quite time-consuming. Also, it is important to 
note that we did not force any verification errors to occur. It was our objective to monitor 
cpu time under natural circumstances for evolving FSAs. When errors arose they were 
the natural result of applying a learning operator. The "err" columns are missing from 
Table 2 because the values are all 0, i.e., no errors occurred during the experiments due to 
applying Ochange-, although we have observed errors to occur with this operator not during 
the experiments. The lack of errors in the experiments resulted from the particular random 
FSAs that happened to be generated during the experiments. Errors are quite common 
with the Si:)ecific 

^gen version of *^c/tari(^e? Can be seen in Table 3. Note that "N/A" is in 
the "err" column for anything other than a verification algorithm because "err" refers to 
verification errors. 

The algorithms (rows) should be considered in triples "p," "v," and "b," or else as a 
single item "v+b." A "p" next to an algorithm name in Table 2 or 3 denotes it is a product 
algorithm, a "v" that it is a verification algorithm, and a "b" that it is the sum of the "p" 
and "v" entries, i.e., the time for both re-forming the product and reverifying. For example, 
Incj (b) is considered to be an algorithm pair consisting of Incp^od (p) followed by Totalj 
(v) (see rows 1-3 of Table 2). If no product needs to be formed, then the "b" version of the 
algorithm is identical to the "v" version, in which case there is only one row labeled "v+b." 

Tables 4, 5, and 6 re-present a subset (cpu time only) of the data from Tables 2 and 3 
in a format that facilitates some comparisons. In other words. Tables 4, 5, and 6 contain 
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no new data, only reformatted data from Tables 2 and 3. In Tables 4, 5, and 6, results are 
grouped by "p," "v," or "b." 

Let us elaborate on one more interesting issue before listing our experimental hypotheses. 
Recall that we are using a First-Response property FSA and that this FSA checks only that 
the first trigger in every string is followed by a response. For our evolutionary paradigm 
(with dense FSA initialization) when using Iticat-ni^ verifying a First-Response property 
is equivalent to verifying the full Response property. The false errors found by Iucat-ni 
due to its incompleteness are in fact violations of the full Response property. Therefore 
for IncAT-Ni, First-Response FSAs are entirely adequate for reverification of full Response 
properties. Because we used the evolutionary paradigm in these experiments, and because 
IncAT-Ni found the same number of errors as Total at (i-e., Iucat-ni found no false 
errors), for the FSAs in these experiments testing First-Response properties was equivalent 
to testing full Response properties. 

For our experiments, five hypotheses were tested: 

HI: Algorithms tailored specifically for Invariance properties are faster than those for 
AT verification, because the latter are general-purpose (and the product algorithms 
include an additional FSA). 

H2: The incremental algorithms are faster than the total algorithms for both product and 
reverification. This is expected to be true because they were tailored for learning. 

H3: The "NI" versions of the incremental algorithms are faster than their counterparts, 
which do not find new initial states. This is expected because of the increase in 
streamlining. 

H4: Incgen-i and InCgen-R are the fastest of all the algorithms, because they are tailored 
for a less generic learning operator (i.e., Og^n rather than Ochange)^ plus they are also 
tailored for one specific property type, and they sacrifice finding all errors. 

H5: Incgen-i and InCgen-R will have the best scaleup properties. They will not take more 
time as FSA size increases. This latter expectation comes from the worst-case time 
complexity analysis. 

Subsidiary issues we examine are the percentage of wrong predictions (for Iucat-ni and 
InCgen-R, which are not complete algorithms), and the maximum observed speedup. 
The results are the following (unless stated otherwise, look at the "sec" columns): 

HI: To see the results, in Table 2 look at rows 1 through 9 and compare each row r 
in this set with row r+9. In other words, compare row 1 with row 10, row 2 with 
row 11, and so on. Rows 1 through 9 are algorithms for Invariance properties, and 

16. The reason is the following. Dense FSA initialization creates FSAs with no triggers. A learning operator 
is then applied. After learning, Iticat-ni begins reverification at every state from which a new trigger 
could have been added by learning. Thus every trigger in the FSA will be checked to see if it is followed 
by a response. At every generation of our evolutionary learning paradigm, at most one learning operator 
is applied per FSA, and this is immediately followed by reverification and error resolution (if needed). 
Therefore every new trigger will be caught by Iucat-ni and, if not followed by a response, the problem 
will be immediately resolved. 
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25-state FSAs 


45-state FSAs 




sec 


spd 


sec 


spd 


1 


InCprod P 


.000157 


.00497 


.000492 


.00255 


2 


Toia// V 


.023798 


.95663 


.206406 


.97430 


3 


Incj b 


.023955 


.07023 


.206898 


.51133 


4 


InCprod-NI P 


.000206 


.00652 


.000617 


.00320 


5 


Inci-Ni V 


.000169 


.00680 


.000528 


.00320 


6 


Inci-Ni b 


.000375 


.00110 


.001762 


.00435 


7 


Totalprod P 


.031594 


1.0 


.192774 


1.0 


8 




.024877 


1.0 


.211851 


1.0 


9 


To^rt/j b 


.340817 


1.0 


.404625 


1.0 


10 




.000493 


.00507 


.001521 


.00259 


11 


TotalAT V 


.021103 


.98903 


.177665 


.96869 


12 


IncAT b 


.024798 


.20022 


.180707 


.23441 


13 


InCprod-NI P 


.000574 


.00590 


.001786 


.00304 


14 


InCAT-NI V 


.009011 


.37450 


.090824 


.49520 


15 


IncAT-Ni b 


.009585 


.07900 


.092824 


.12013 


16 


Totalprod P 


.097262 


1.0 


.587496 


1.0 


17 


TotalAT V 


.024062 


1.0 


.183409 


1.0 


18 


TotalAT b 


.121324 


1.0 


.770905 


1.0 



Table 2: Average performance over 30 runs (5 properties, 6 runs each) with operator Odidjige 
and dense FSAs. Rows 1 through 9 are for reverification of Invariance properties 
and rows 10 through 18 are for AT reverification of Response properties. 





25-state FSAs 


45-state FSAs 




sec 


spd 


err 


sec 


spd 


err 


1 


InCgen-I V+b 


.000001 


4.25e-5 


.20 


.000002 


9.75e-6 


.07 


2 


In-ci-Nl v+b 


.000002 


8.51e-5 


.20 


.000003 


1.46e-5 


.07 


3 


Totali v+b 


.023500 


1.0 


.20 


.205082 


1.0 


.07 


4 


InCgen-R V+b 


.000007 


7.23e-8 


.73 


.000006 


2.09e-9 


.73 


5 


InCprod-NI P 


.000006 


5.22e-5 


N/A 


.000006 


8.51e-6 


N/A 


6 


InCAT-NI V 


94.660700 


.98099 


3569.33 


2423.550000 


.84442 


12553.40 


7 


IncAT-Nl b 


94.660706 


.97982 


N/A 


2423.550006 


.84421 


N/A 


8 


Totalprod P 


.114825 


1.0 


N/A 


.704934 


1.0 


N/A 


9 


TotalAT V 


96.495400 


1.0 


3569.33 


2870.080000 


1.0 


12553.40 


10 


TotalAT b 


96.610225 


1.0 


N/A 


2870.784934 


1.0 


N/A 



Table 3: Average performance over 30 runs (5 properties, 6 runs each) with operator o^en 
and dense FSAs. Rows 1 through 3 are for reverification of Invariance properties 
and rows 4 through 10 are for reverification of Response properties. 
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.206406 


5 
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.000169 


.000528 


6 


Toia// V 


.024877 


.211851 


7 


Inc] b 


.023955 


.206898 


8 


Incj-Ni b 


.000375 


.001762 


9 


Toto// b 


.340817 


.404625 



Table 4: Average cpu time (in seconds) over 30 runs with operator Ochange frnd five Invari- 
ance properties. This table is a duplication of some of the material in Table 2. 





25-state FSAs 


45-state FSAs 


1 


InCp^Qfl p 


.000493 


.001521 


2 


InCprod-NI P 


.000574 


.001786 


3 


Totalprod P 


.097262 


.587496 


4 


TotalAT V 


.021103 


.177665 


5 


InCAT-NI V 


.009011 


.090824 


6 


TotalAT V 


.024062 


.183409 


7 


IncAT b 


.024798 


.180707 


8 


IncAT-Ni b 


.009585 


.092824 


g 


TotalAT b 


.121324 


.770905 



Table 5: Average cpu time (in seconds) over 30 runs with operator Ochange and five Response 
properties. This table is a duplication of some of the material in Table 2. 
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InCprod P 


.000006 


.000006 
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Totalprod P 


.114825 


.704934 


4 


InCgen-R V 


.000007 


.000006 


5 


IncAT V 


94.660700 


2423.550000 


6 


TotalAT V 


96.495400 


2870.080000 


7 


InCgen-R b 


.000007 


.000006 


8 


IncAT b 


94.660706 


2423.550006 


9 


TotalAT b 


96.610225 


2870.784934 



Table 6: Average cpu time (in seconds) over 30 runs with operator o^en and five Response 
properties. This table is a duplication of some of the material in Table 3. 
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rows 10 through 18 are algorithms for AT verification. In Table 3, rows 1 through 
3 are algorithms for Invariance properties, and rows 5 through 10 are algorithms for 
AT verification. Compare row 2 with 7, and 3 with 10. (Rows 1 and 4 cannot be 
compared because row 4 has an algorithm tailored for Response properties.) Note 
that these comparisons are between a "v+b" and a "b." Since "v+b" means "v" or 
"b," this is a correct comparison. These results show that HI is mostly, but 
not completely, confirmed. It is confirmed for all results in Table 3. On the other 
hand, the results are mixed for Table 2. 

H2: The easiest way to compare is to examine Tables 4, 5, and 6. In these cases the 
comparison is between the first two rows labeled "p" (or "v" or "b" ) versus the third 
row of that same label. The reason for making these comparisons is that the first 
two rows of a given label correspond to an incremental algorithm (except for row 4 of 
Tables 4 and 5) and the third row of a given label corresponds to a total algorithm. 
Alternatively, one could examine Tables 2 and 3. In Table 2, rows 1 through 6 (other 
than 2) and 10 through 15 (other than 11) are incremental algorithms, and rows 2, 11, 
7 through 9, and 16 through 18 are total reverification algorithms. The appropriate 
comparisons are between rows 1 and 7, 4 and 7, 5 and 8, 3 and 9, 6 and 9, 10 and 16. 
13 and 16, 14 and 17, 12 and 18, and 15 and 18. In Table 3, rows 1, 2, and 4 through 
7 are incremental algorithms, and rows 3 and 8 through 10 are total. The appropriate 
comparisons are between rows 1 and 3, 2 and 3, 4 and 10, 5 and 8, 6 and 9, and 7 
and 10. All results confirm H2. The statistical significance of the comparisons in 
Tables 2 and 3 were tested. Using an exact Wilcoxon rank-sum test, all comparisons 
relevant to hypothesis H2 in Table 2 are statistically significant {p < 0.01 and, in 
most cases, p < 0.0001). In Table 3, however, the differences between Iucat-ni and 
Total AT (both the (v) and (b) versions) are not statistically significant at the p < 0.01 
level. All other comparisons in Table 3 are significant at the p < 0.01 level. 

H3: This hypothesis does not apply to the algorithms for re-forming the product FSA 
because, obviously, it will require more time to get the new initial states for the 
"NI" versions. We wish to test the overall time savings of the "NI" versions, so we 
concentrate on the rows labeled "b." The relevant comparisons are row 7 versus 8 
in Table 4 and row 7 versus 8 in Table 5. (Alternatively, one could compare row 3 
versus 6, and row 12 versus 15 in Table 2.) Each of these comparisons is between an 
"NI" version and a counterpart version of the algorithm that is the same as the "NI" 
version except that it does not find new initial states. Tables 3 and 6 are not relevant 
because they only have the "NI" versions but not their counterparts. (We only saw 
the need to make one comparison between all "NI" versions and their counterparts, 
which is reflected in Table 2.) All results confirm hypothesis H3. After testing 
the statistical significance, it is found that the results are significant {p < 0.01). 

H4: To determine H4 requires considering Table 3 but not Table 2. This is because we only 
need to compare algorithms for which Ogen has been applied. Compare row 1 versus 2, 
1 versus 3, 4 versus 7, and 4 versus 10 to see the results. All results show Incg^n-i (row 
1) and Incgen-R (row 4) to be at least as fast as the other algorithms. Therefore H4 
is confirmed. In all cases other than Incgen-i (row 1) versus Incj^Ni (row 2), there 
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is a noticeable speedup. In most cases, the speedup is quite dramatic. All noticeable 
speedups are statistically significant {p < 0.0001). 

H5: To test H5, compare the first "spd" column (for 25-state FSAs) with the second 
column with this label (for 45-state FSAs). A more desirable scaleup shows a lower 
value for "spd" as the size of the FSA increases. It implies that the ratio of the cpu 
time of the incremental algorithm to the cpu time of the total algorithm decreases 
more (or increases less) as the FSA size increases. One should make this two-column 
comparison for rows 1 through 6 (but not 2) and 10 through 15 (but not 11) of Table 2, 
and rows 1 and 2, and 4 through 7 of Table 3 because these are all the incremental 
algorithms. (We don't care about the total algorithms because "spd" is, by definition, 
always 1.0 for them.)^^ If one considers the results of algorithms appearing in both 
tables (e.g., Incj-Ni shows different scaleup properties in the two tables, but we need 
to consider both sets of results), then clearly Incgen-i (row 1) and InCgen-R (row 4) in 
Table 3 show the best scaleup of all the incremental algorithms. H5 is confirmed. It 
is apparent from the "sec" columns that the time complexity of these two algorithms 
does not increase (other than minor fluctuations) as FSA size increases (see Table 3). 

A couple of subsidiary issues are now addressed. For one, recall that Iucat-ni and 
Incgen-R are not complete. Therefore, it is relevant to consider the percentage of incorrect 
predictions (i.e., false errors) they made. Iucat-ni made none. For the results in Table 3, 
33% of InCgen-RS predictions were wrong (i.e., false errors) for the size 25 FSAs, and 50% 
were wrong for the size 45 FSAs. 

Finally, consider the maximum observable speedup. InCgen-n shows a ^-billion- fold 
speedup over Total at on size 45 FSA problems (averaged over 30 runs)! This alleviates much 
of the concern about Incgen-R^ false error rate. For example, given the rapid reverification 
time of Incgen-R, an agent could use it to reverify a long sequence of learning operators 
culminating in one that satisfles the property in considerably less time than it takes Total at 
to reverify one learning operator. 

We conclude this section by summarizing, in Table 7. the fastest algorithm (based on 
our results) for every operator, situation, and property type. In Table 7, it is assumed 
that a First-Response FSA is used for AT verification of Response properties. Operator 
Oadd-action is Omitted from this table because it is not clear at this time whether it would 
be faster to apply total reverification or perform multiple applications of the incremental 
algorithm (one for each primitive operator application). Section 8 considers an alternative 
solution as future work. In Table 7, "None" means no reverification is required, i.e., the 
learning operator is a priori guaranteed to be an SML for this situation and property class. 

7. Related Work 

There has been a great deal of recent research on model checking, and even on model 
checking of distributed systems (Holzmann, 1991). Nevertheless, there is very little in the 
literature about model checking applied to systems that change. Two notable exceptions 
are the research of Sokolsky and Smolka (1994) on incremental reverification and that of 

17. If "spd" ^1.0 for a total algorithm, this is due to the statistical variation in run time. 
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Table 7: Learning operators with the fastest reverification method. 



Sekar et al. (1994). Both of these papers are about reverification of software after user 
edits rather than adaptive agents. Nevertheless the work is related. Sokolsky and Smolka 
use the modal //-calculus to express Invariance and Liveness properties. They present an 
incremental version of a model checker that does block- by-block global computations of fixed 
points, rather than AT or property-specific model checking as we do. The learning operators 
assumed by their algorithm are edge deletions/additions on a representation similar to FSAs 
called LTS (but unlike our multiagent work, they assume a single LTS). The worst-case 
time complexity of their algorithm is the same as that of total reverification, although 
their empirical results are good. Note that we have a priori results for edge deletion. 
However we do not have an incremental algorithm specifically tailored for edge addition (for 
multiple agents and AT or property-specific model checking); thus this may be a fruitful 
direction for future research. Sekar et al.'s approach consists of converting rule sets to 
FSAs, then generating and testing functions that map from the post- to the prelearning 
FS A and property. If the desired function can be found, they apply a theorem from Kurshan 
(1994), which guarantees that the learning is "safe." Although no complexity results are 
provided, the generate-and-test approach that they describe appears to be computationally 
expensive. In contrast to Sekar et al., we have proofs and empirical evidence that our 
methods are efficient and, in some cases, that they are substantially more efficient than 
total reverification from scratch. 

There is also related research in the field of classical planning. In particular. Weld and 
Etzioni (1994) have a method to incrementally test an agent's plan to decide whether to 
add new actions to the plan. Actions are added only when their effects do not violate a 
certain type of Invariance property. Their method has some similarities with our InCggn-i 
algorithm. One difference is that our method is for reactive rather than projective plans. 
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Another is that our verification method is expressed using the formal foundations in the 
model checking literature. 

As mentioned in the introduction of this paper, FSAs have been shown to be effective 
representations of reactive agent plans/strategies (Burkhard, 1993; Kabanza, 1995; Carmel 
& Markovitch, 1996; Fogel, 1996). FSA plans have been used both for multiagent competi- 
tion and coordination. For example, Fogel's (1996) co-evolving FSA agents for competitive 
game playing were mentioned above. A similarity with our work is that Fogel assumes 
agents' plans are expressed as w-automata. Nevertheless, Fogel never discusses verification 
of these plans. Goldman and Rosenschein (1994) present a method for multiagent coordi- 
nation that assumes FSA plans. Multiple agents cooperate by taking actions to favorably 
alter their environment. The cooperation strategy is implemented by a plan developer who 
manually edits the FSAs. The relationship to the work here is that they present FSA 
transformations that ensure multiagent coordination. Likewise, in our research, a learn- 
ing operator that is a priori guaranteed "safe" for some multiagent coordination property 
transforms the FSA while ensuring coordination. Although both their method and ours 
guarantee this coordination, their solution is manual whereas ours is entirely automated. 

Some of the more recent research on agent coordination applies formal verification meth- 
ods. For example, Lee and Durfee (1997) model their agents' semantics with a formalism 
similar to Petri nets (rather than FSAs). They verify synchronization (Invariance) proper- 
ties, which prevent deadlock, using model checking. Furthermore, Lee and Durfee suggest 
recovery from failed verification using two methods: concept learning, and a method analo- 
gous to that used by Ramadge and Wonham (1989). Burkhard (1993) and Kabanza (1995) 
assume agent plans are represented as w-automata, and they address issues of model check- 
ing temporal logic properties of the joint (multiagent) plans. Thus there is a growing 
precedent for addressing multiagent coordination by expressing plans as w-automata and 
verifying them with model checking. Our work builds on this precedent, and also extends 
it, because none of this previous research addresses efficient reverification for agents that 
learn. 

Finally, there are alternative methods for constraining the behavior of agents, which 
are complementary to reverification and self-repair. For example, Shoham and Tennenholtz 
(1995) design agents that obey social laws, e.g., safety conventions, by restricting the agents' 
actions. Nevertheless, the plan designer may not be able to anticipate and engineer all laws 
into the agents beforehand, especially if the agents have to adapt. One solution is to use 
laws that allow maximum flexibility (Fitoussi & Tennenholtz, 1998). However this solution 
does not allow for certain changes in the plan, such as the addition or deletion of actions. 
An appealing alternative would be to couple initial engineering of social laws with efficient 
reverification after learning. 

A method for ensuring physically bounded behavior of agents is "artificial physics" 
(Spears & Gordon, 1999). With artificial physics, multiagent behavior is restricted by 
artificial forces between the agents. Nevertheless, when encountering severe unanticipated 
circumstances, artificial physics needs to be complemented with reverification and "steering" 
for self-repair (Gordon et al., 1999). 
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8. Summary and Future Work 

Agent technology is growing rapidly in popularity. To handle real-world domains and in- 
teractions with people, agents must be adaptable, predictable, and rapidly responsive. An 
approach to resolving these potentially conflicting requirements is presented here. In sum- 
mary, we have shown that certain machine learning operators are a priori (with no run-time 
reverification) safe to perform. In other words, when certain desirable properties hold prior 
to learning, they are guaranteed to hold post-learning. The property classes considered here 
are Invariance and Response. Learning operators Odeiete, Ogpec, Odeietevspec, and Odeiete-action 
were found to preserve properties in either of these classes. For SITiagent and SITipiam 
where there is a single (multi) agent FSA plan, Odeiete+gem Ogpec+gen and Ogtay were found to 
preserve Invariance properties. All of the a priori results are independent of the size of the 
FSA and are therefore applicable to any FSA that has been model checked originally. 

We then discussed transformations of learning operators and their corresponding a priori 
results to a product plan. This addresses SITmultplans, where multiple agents each have their 
own plan but the multiagent plan must be re-formed and reverified to determine whether 
multiagent properties are preserved. It was discovered that only o^e/etei Ospec, o deleted spec^ 
and Odeiete-acHon preserve their a priori results for this situation. 

Finally, we presented novel incremental reverification algorithms for all cases in which the 
a priori results are negative. It was shown in both theoretical and empirical comparisons that 
these algorithms can substantially improve the time complexity of reverification over total 
reverification from scratch. Empirical results showed as much as a ^-billion-fold speedup. 
These are initial results, but continued research along these lines will likely be applicable to 
a wide range of important problems, including a variety of agent domains as well as more 
general software applications. 

When learning is required, we suggest that the a priori results should be consulted 
first. If no positive results (i.e., the learning operator is an SML) exist, then incremental 
reverification proceeds. 

To test our overall framework, we have implemented the rovers example of this paper 
as co-evolving agents assuming SITjnuHplans- i-^- multiple agents each with its own plan. 
By using the a priori results and incremental algorithms, we achieved significant speedups. 
We have also developed a more sophisticated application that uses reverification during 
evolution. Two agents compete in a board game, and one of the agents evolves its strategy to 
improve it. The key lesson that has been learned from this implementation is that although 
the types of FSAs and learning operators are slightly different from those presented in this 
paper, and the property is quite different (it is a check for a certain type of cyclic behavior 
on the board), initial experiences show that the methodology and basic results here could 
potentially be easily extended to a variety of multiagent applications. 

Future work will focus primarily on extending the a priori results to other learning 
operators/methods and property classes, developing other incremental reverification algo- 
rithms, and exploring plan repair to recover from reverification failures. One way in which 
the a priori results might be extended is by discovering when learning operators will make 
a property true, even if it was not true before learning. 

A question that was not addressed here is whether the incremental methods are useful if 
multiple machine learning operators are applied in batch (e.g., as one might wish to do with 
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operator o add- action)- In the future we would like to explore how to handle this situation 
- is it more efficient to treat the operators as having been done one-at-a-time and use 
incremental reverification for each? Or is total reverification from scratch preferable? Or, 
better yet, can we develop efficient incremental algorithms for sets of learning operators? 

Plan repair was not discussed in this paper and is an important future direction. The 
research of De Raedt and Bruynooghe (1994), which uses counterexamples to guide the 
revision of theories subject to integrity constraints, may provide some ideas. There are 
also plan repair methods in the classical planning literature that might be relevant to our 
approach (Joslin & Pollack, 1994; Weld & Etzioni, 1994). It would be interesting to compare 
the time to repair plans versus trying another learning operator and reverifying. 

A limitation of our approach is that it does not handle stochastic plans or properties 
with time limits, e.g., a Response property for which the response must occur within a 
specified time after the trigger. We would like to extend this research to stochastic FSAs 
(Tzeng, 1992) and timed FS As/properties (Alur & Dill, 1994; Kabanza, 1995), as well 
as other common agent representations besides FSAs. Another direction for future work 
would be to extend our results to symbolic model checking, which uses binary decision 
diagrams (BDDs) so that the full state space need not be explicitly explored during model 
checking (Burch et al., 1994). In some cases, symbolic model checking can produce dramatic 
speedup. However, none of the current research on symbolic model checking addresses 
adaptive systems. 

Additionally, the ideas here are applicable to some of the FSA-based control theory 
work. For example, Ramadge and Wonham (1989) assume FSA representations for both the 
plant (which is assumed to be a discrete-event system) and the supervisor (which controls 
the actions of the plant). We are currently applying some of the principles of efficient 
reverification to change the supervisor in response to changes in the plant in a manner that 
preserves properties (Gordon & Kiriakidis, 2000). 

Finally, future work should focus on studying how to operationalize Asimov's Laws for 
intelligent agents. What sorts of properties best express these laws? Weld and Etzioni 
(1994) provide some initial suggestions, but much more remains to be done. 
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Appendix A. Glossary of Notation 



model checking 
AT 

an 

S ITYnultplans 

FSA 
V{S) 
E(S) 

transition condition 
IC 

Mk{S) 
MK:{vi,Vj) 
I{S) 
atoms 

string 

a;-automaton 
run 

accepting run 
acceptance criterion 

® 

complete FSA 

deterministic FSA 

path 

cycle 

c-state 

accessible from 

□ 

O 

Invariance property 
Response property 
First-Response property 
BiS) 
t 

r 

SML 

sound algorithm 
complete algorithm 
S 



Models (satisfies) 

A verification method entaiUng brute-force search 
Automata-theoretic model checking 
Single agent situation 

Multiagent situation where each agent uses a multiagent plan 

Multiagent situation where each agent uses an individual plan 

Finite-state automaton 

The set of states (vertices) of FSA S 

The set of state-to-state transitions (edges) of FSA S 

Logical description of the set of actions enabling a transition 

A Boolean algebra 

Boolean algebra partial order; x^yiSxAy = x 
The matrix of transition conditions of FSA S 
Transition condition associated with edge {vi,Vj) 
The set of initial states of FSA S 

Primitive elements of a Boolean algebra; atoms are actions 

Sequence of actions (atoms) 

The language of (set of strings accepted by) FSA 5" 

An FSA that accepts infinite-length strings 

The sequence of FSA vertices visited by a string 

The run of a string in the FSA language 

A requirement of accepting runs of an FSA 

The tensor (synchronous) product of FSAs 

Specifies a transition for every possible action 

The choice of action uniquely determines the next state 

Sequence of vertices connected by edges 

A path with start and end vertices identical 

Computational state; an action occurring in a computation 

There exists a path from 

Temporal logic "invariant" 

Temporal logic "eventually" 

□ -1^, i.e., "Invariant not ^" 

□ (p — >■ Og), i.e., "Every p is eventually followed by g" 
The first p (trigger) is followed by a g (response) 
The set of "bad" (to be avoided) states of FSA S 
Can increase accessibility 

Cannot increase accessibility 
Can decrease accessibility 
Cannot decrease accessibility 

Safe machine learning operator, i.e., preserves properties 
One that is correct when it states that S \= P 
One that is correct when it states that S ^ P 
The FSA transition function 
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Appendix B. Temporal logic properties 

This appendix, which is based on Manna and Pnueli (1991), formally defines Invariance and 
Response properties in temporal logic. We begin by defining the basic temporal operator U 
(Until). We assume a string (a^o, •••) of c-states of FSA 5, where < i, j, k. Then for c-state 
formulae p and g, we define Until sls Xj \= p U q ^ for some k > Xk\= and for every i 
such that j < i < k, Xi |= p. 

Invariance properties are defined in terms of Eventually properties, so we define Even- 
tually first. For c-state formula/; and FSA 5", we define property P = Op ("Eventually p") 
as a property that is true (false) for a string if it is true (false) at the initial c-state xq of 
the string. Formally, if x = {xq, ...) is a string of FSA S, then x |= Op -i?^ xq \= true U p, 
i.e., "eventually p." A property P = O-ip ("Invariant not p") is defined as x |= D-ip <^ 
X 1= ^Op, i.e., "never p." Finally, a Response formula is of the form □(/) — )■ Oq), where p 
is called the "trigger" and q the "response." A Response formula states that every trigger 
is eventually followed by a response. 

Appendix C. Properties for Experiments 

The following five Invariance properties were used in the test suite: 

□ (-i(I-deliver A L-transmit)) 

□ (-i(I-deliver A L-pause)) 

□ (-.(F-coUect A I-deHver)) 

□ (-i(F-collect A I-deliver A L-receive)) 

□ (-i(F-deliver A I- receive A L-pause)) 

The following five Response properties were used in the test suite: 

□ (F-deliver — >■ O L-receive) 

□ (F-deliver — )■ O I-receive) 

□ (F-collect O L-transmit) 

□ ((F-collect A I-deliver) O L-receive) 

□ (F-deliver O (l-receive A L-receive)) 
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